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I.  INTRODUCTION 


A.  Motivation  and  Perspective 

This  thesis  is  concerned  with  some  problems  that  arise  in 
connection  with  the  statistical  multiplexing  of  voice  and  data. 
The  motivation  derives  from  the  observation  that  conversational 
speech  actually  consists  of  alternating  periods  of  sound  pro¬ 
duction  and  silence.  The  most  readily  identifiable  source  of 
this  behavior  is  the  alternation  of  dominance  that  naturally 
occurs  as  two  parties  converse.  However,  even  the  speech  of 
the  party  considered  the  "talker"  is  not  a  continuous  stream 
of  sound  if  one  looks  on  a  finer  time  scale.  Rather,  it  too 
consists  of  an  alternating  sequence  of  talkspurts  (which  can 

comprise  a  fev  vrards  or  phrases)  and  silences.  On  even  finer 
time  and  frequency  scales,  one  can  observe  that  a  talkspurt 
itself  does  not  always  occupy  the  full  "long-term"  speech 
bandwidth,  which  is  approximately  between  0  and  4000  Hz. 

The  success  of  any  capacity  sharing  scheme  that  exploits 
these  characteristics  depends  on  a  variety  of  physical  and 
statistical  considerations  and  their  interplay.  For  the 
sake  of  discussion,  consider  a  model  system  in  which  N 
speakers  share  Cy  "units"  of  capacity.  Each  speaker  alter¬ 
nates  between  an  active  state,  in  which  one  unit  of  capacity 
is  required,  and  a  silent  state  in  which  no  capacity  is 
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required.  The  system  operates  as  follows.  Each  speaker  is 
connected  to  an  activity  detector  and  a  switch.  When  the 
detector  observes  a  transition  from  silence  to  activity,  the 

switch  connects  the  speaker  to  a  unit  of  capacity,  if  it  is 
available.  The  connection  is  maintained  until  silence  occurs. 
If  capacity  is  not  available,  the  speaker  is  simply  locked  out 
and  joins  a  pool  of  other  active,  waiting  speakers.  This  pool 
is  served  on  a  first  come- first  serve  (FCFS)  basis  as  capacity 
becomes  available.  During  the  lock  out  period,  speech  is  lost, 
and  a  speaker  who  becomes  silent  while  waiting,  departs  the 
pool. 

One  useful  performance  measure  of  this  system  is  the 
average  loss  or  cut-out  fraction  defined  as 

_ Time  Averaged  Less 

Time  Averaged  Offered  Load 

The  value  of  <j>  is  determined  by  what  are  effectively  the  physi¬ 
cal  and  statistical  "response  times".  The  physical  response 
time  is  characterized  by  the  switching  time,  tc,  and  x^,  which 
is  the  "window"  that  the  detector  needs  to  accurately  track 
activity.  For  successful  operation,  it  is  clear  that  x^  +  Tg 
must  be  much  smaller  than  the  average  active  time  and  average 
silence  time  (xAC  and  xSL  respectively  ).  If  this  condition 
is  met,  one  must  lock  at  the  "statistical"  loss  incurred  in 
waiting  for  capacity.  This  depends  on  N,  Cv,  and  the  statis¬ 
tical  behavior  of  the  speakers.  In  the  ideal  case  of 


Tg  *  *  0,  and  with  suitable  assumptions  about  the  activity 

process,  one  can  show 

£ 

4-  V1 

♦  *  N 

£  t 

i-i  * 

where  p£  is  the  "equilibrium'*  probability  that  l  speakers  are 
active.  See  Weinstein  [1].  We  will  elaborate  on  this  later. 

In  the  above  model,  a  unit  of  capacity  depends  on  the 
context.  On  the  talkspurt- silence  level,  it  is  the  capacity  to 
handle  all  the  "bit  rate"  of  an  encoded  talkspurt.  On  a  finer 
scale,  it  may  refer  to  some  sub-band  of  the  full  speech  band¬ 
width  ,  in  which  case,  the  loss  is  only  for  that  sub-band. 

The  physical  problems  of  frequency  sub-band  multiplexing  are 
formidable  because  of  the  small  times  involved.  That  is,  the 
short-term  bandwidth  of  a  talkspurt  moves  around  rapidly 
within  the  long  terra  spectrum  so  that  the  detection  problem 
is  hard.  However,  activity  detection  on  the  talkspurt -silence 
level  is  quite  feasible  and  has  been  implemented.  For  example, 
in  the  late  1950' s,  Bell  Telephone  built  the  TASI  (an  acronym 
for  time  assigned  speech  interpolation)  system  for  use  on 
transoceanic  cables.  Thus,  the  ratio  N/C  is  referred  to  as 
the  TASI  advantage ■  Although  the  original  system  was  in  an 
analog  environment,  we  will  use  TASI  as  a  generic  term  for  any 
such  statistical  multiplexing  scheme.  All  future  discussion 


will  pertain  only  to  the  talkspurt-silence  level. 

TASI  is  successful  because  little  speech  is  lost  during 
detection  and  switching.  (Typically,  rs  and  t d  are  on  the  order 
of  10-20  ms  whereas  and  are  on  the  order  of  1  s.)  In 
the  next  section,  we  will  see  that  digital  switching  between 
voice  and  data  on  these  time  scales  is  no  harder, and  one  can 
consider  transmitting  data  during  the  silences  as  well.  This 
is  the  main  topic  of  the  thesis.  Because  voice  traffic  must 
meet  certain  rather  stringent  delay  requirements,  data  must 
have  a  lower  priority  to  some  degree.  Thus,  the  data  queue 
effectively  sees  a  server  whose  rate  is  strongly  governed 
by  speaker  activity.  The  main  question  will  be  whether  voice 
activity  returns  from  "high"  levels  to  its  "mean"  level 
sufficiently  rapidly,  so  that  data  backlogs  which  accumulate 
while  voice  activity  is  high,  can  be  emptied  in  "reasonable" 
time.  Before  delving  into  this,  we  will  spend  the  next  few 
sections  discussing  some  implementation  considerations  and 
voice  activity  models. 
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B.  Network  and  Implementation  Considerations 

In  a  network  setting,  the  previous  analysis  applies  only 
to  an  end-to-end  or,  conceptually,  a  single-link  version  of 
TASI.  That  is,  if  N  callers  at  a  node  A  share  Cy  "channels" 
that  transmit  their  speech  to  B,  and  each  channel  can  only 
accommodate  one  active  speaker,  then  the  cutout  fraction 
is  the  fraction  of  output  that  does  not  reach  B.  Since  all 
allocation  decisions  are  made  at  A,  the  physical  realization 
of  these  channels  is  not  important  in  the  analysis;  they  can 
be  viewed  as  a  single  link  of  equivalent  capacity.  In  a  real 
network,  connections  between  users  often  comprise  multihop 
paths,  and  a  given  link  is  usually  a  part  of  paths  between 
many  sources  and  destinations.  Thus,  one  can  conceive  of 
"network  TASI"  in  which  the  output  of  a  caller  can  be  pre¬ 
empted  at  any  node  on  the  path  it  follows,  and  all  nodes 
cooperate  in  globally  allocating  capacity. 

With  analog  transmission  and  electromechanical  switching 
(i.e.  relays)  or  even  digital  transmission  and  switching  with 
semiconductor  logic  gates,  the  nodes  have  neither  the  time 
nor  the  processing  power  to  make  the  necessary  decisions. 
Therefore,  early  TASI  systems  were  indeed  single-link  oper¬ 
ations  used  to  increase  the  "virtual"  voice  capacity  of  rela¬ 
tively  expensive  backbone  trunks  (such  as  transoceanic  cables) . 
With  digitized  speech  and  the  current  "software"  switching 
technology,  network  TASI  is  possible  to  implement.  In  fact, 
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digitized  speech,  with  activity  detection,  may  be  viewed  as 
another  type  of  bursty  "data"  traffic  whose  arrival  statistics 
and  delay  requirements  are  different  from  those  of  conventional 
types.  Thus,  the  network  TASI  problem  is  only  part  of  an  inte¬ 
grated  voice/data  network's  overall  allocation  problem. 

Network  resource  allocation  problems  are  difficult,  and 
one  usually  cannot  conduct  a  detailed  queuing  or  loss  analysis. 
Instead,  one  often  attempts  to  separate  the  "probabilistic"  from 
the  "networking"  issues.  For  example,  one  might  explore  the 
"networking"  aspects  of  minimum  delay  routing  problems  by 
assuming  that  the  average  queuing  delay  (at  node  i)  of  traffic 
using  link  (i,j)  depends  only  on  the  link  capacity  and  average 
flow  on  the  link.  Here,  a  "networking"  question  is,  for 
example,  how  should  the  nodes  cooperate  to  find  best  routes 
given  that  each  initially  knows  only  the  flows  on  its  links? 

A  single  link  or  tandem  links  queuing  analysis  can  be  used  to 
explore  the  validity  of  the  assumption,  i.e.  can  higher  order 
moments  of  the  flows  be  neglected  in  computing  average  delay, 
can  statistical  dependencies  between  queues  be. neglected,  etc. 

In  this  approach,  prior  knowledge  of  the  particular  net¬ 
work  architecture  or  quantity  of  interest  can  sometimes  be 
used  to  tailor  the  single  link  model  so  that  one  can  focus 
on  specific  issues.  This  is  not  done  in  the  thesis.  That  is, 
we  will  use  a  general  model  of  a  single  vcice/data  link  and 
analyze  a  variety  of  quantities  that  might  later  be  used  in 
network  approximations.  Nevertheless,  it  is  helpful  to  first 


have  a  qualitative  understanding  of  the  transport  requirements 
of  voice  and  data  and  of  various  switching  disciplines. 

Our  discussion  of  these  issues  will  be  conducted  in  the 
context  of  the  following  time-division  multiplexed  (TDM)  switch 
architecture  and  transmission  format.  Where  we  give  values  for 
certain  parameters,  these  values  reflect  our  understanding  of  the 
capabilities  of  current  technology.  This  has  come  primarily 
through  "private  discussion",  so  we  do  not  provide  specific 
references.  At  the  end,  we  do  indicate  some  tradeoffs  affecting 
the  choices  of  values  for  these  parameters.  A  more  detailed 
technical  survey  and  bibliography  can  be  found  in  [2  ] . 

In  the  TDM  architecture,  time  is  divided  into  units  of 
length  t  called  frames.  (Typically,  ts  10-50  ms).  The  "atomic" 
unit  of  transmission  is  the  block,  which  consists  of  b  bits. 

We  say  that  a  link  has  capacity  CT  blocks/frame  if  the  associated 
node-*-link-*-node  combination  has  the  processing  and  transmission 
capacity  to  handle  CT  blocks/frame  on  a  pipelined  basis. 

The  meaning  of  this  can  be  understood  with  the  aid  of  the 
following  diagram. 


Depart  no de.  1 


Arrive. 
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During  every  frame,  node  1  "enters"  C^.b  bits  into  the  channel. 
Consecutive  groups  of  b  bits  are  viewed  as  blocks  occupying 
slots  in  time.  Block  departures  at  node  1  are  taken  as  point 
events .occurring  when  the  last  bit  of  a  block  enters  the  channel. 
The  arrival  process  at  node  2  is  viewed  simiarly,  and  an  event 
occurs  when  the  last  bit  of  a  block  is  in  node  2's  memory  and 
ready  for  further  transmission.  The  time  between  block  departure 
and  arrival  is  called  the  link  delay.  (We  assume  link  delays 
do  not  change  in  time.)  In  practice,  this  might  consist  of 
physical  propagation  time  and  some  processing  time  to  get  the 
bits  into  memory.  If  the  link  delay  is  larger  than  t  ,  there 
will  be  at  least  one  complete  frame  of  blocks  in  the  pipeline 
at  any  instant.  This  will  usually  be  the  case  on  a  geosynchro¬ 
nous  satellite  link  because  the  round  trip  propagation  time 
is  about  .25  sec.  (The  altitude  of  the  geosynchronous  orbit  is 
about  23,000  miles.)  By  "pipelined  on  a  block  basis",  we  mean 
that  the  first  bit  of  a  block  cannot  leave  a  node  until  the 
last  bit  of  that  block  has  arrived.  Thus,  the  first  bit  incurs 
a  delay  which  is  the  sum  of  the  link  delay  and  the  duration  of 
a  slot. 

In  a  network  of  these  switches,  this  slotted  frame  format 
is  used  on  every  (directed)  link,  though  we  do  not  assume  that 
frame  boundaries  are  globally  aligned,  t  and  b  are  the  only 
global  constants.  We  do  assume  for  now  that  links  are  noiseless. 

Sources  are  connected  to  the  network  through  a  host  node , 
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and  the  host  and  source  communicate  through  a  source  buffer . 

At  the  destination  node  there  is  a  decoder  which  uses  the 
arriving  bits  to  reproduce  the  source's  messages  for  some 
"final"  user.  We  consider  the  decoder  to  be  outside  the  net¬ 
work.  The  time  that  a  source  is  connected  to  the  network  is 
called  the  session.  Two  types  of  users,  "voice"  and  "data", 
are  considered. 

Note:  The  callers  of  a  two-way  conversation  are  treated  as 
independent  sources,  and  each  speaker's  output  is  viewed  as 
a  sequence  of  "one-way"  messages.  To  allow  them  to  sustain 
normal  conversation,  the  network  must  meet  certain  delay 
requirements-  Other  than  this  it  does  not  "recognize"  them  as 
interacting  users.  These  requirements  will  be  discussed  later. 
All  data  messages  are  "one-way". 

A  data  source  places  bits  into  the  source  buffer  in  an 
arbitrary  manner.  These  bits  are  viewed  as  a  sequence  of 
messages  as  prescribed  by  the  user. The  network  can  transport 
individual  bits  as  it  chooses,  as  long  as  the  following 
requirements  are  met. 

1)  No  loss  -  All  bits  must  be  delivered  to  the  decoder 
in  correct  order. 

2)  The  network  must  separate  messages  for  the  decoder. 

3)  End-to-end  delavs  of  messages  must  meet  some  (possibly 
statistical)  requirements.  The  delay  of  a  message  is 
the  time  between  the  entry  of  its  last  bit  into  the 
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source  buffer  and  the  delivery  of  this  bit  to  the 
decoder. 

\ 

Speech  is  digitized  at  a  rate  b/t  bits  per  sec.  The  encoder 
also  performs  activity  detection.  That  is,  a  block  is  placed 
into  the  buffer  at  the  end  of  a  frame  only  if  the  encoder  determines 
that  the  speaker  was  active  during  that  frame.  (We  view  this 
placement  as  a  point  event.  Also,  frame  boundaries  at  the  en¬ 
coder  and  host  node  need  not  be  aligned.)  We  assume  that  speech 
loss  caused  by  incorrect  activity  decisions  is  negligible.  The 
first  block  of  a  talkspurt  is  marked  and  contains  a  number  in¬ 
dicating  the  duration  (in  frames)  of  the  preceding  silence. 

(Obviously  the  actual  speech  digitization  rate  must  be  reduced 
slightly  to  accommodate  such  overhead.  We  neglect  this.) 

Characterizing  the  delay  requirements  of  voice  is  difficult 
because  the  "message"  is  not  clearly  defined.  Theoretically, 
speakers  could  communicate  via  a  sequence  of  "voice  telegrams", 
where  each  telegram  contains  "a  thought"  and  can  be  delayed  a 
few  seconds.  This  is  not  the  same  as  "normal"  conversation,  in 
which  speakers  implicitly  use  silences  to  separate  "messages". 
Approximately  2S0  ms  is  usually  given  as  the  maximum  acceptable 
delay  (the  time  between  the  beginning  of  a  talkspurt  at  the 
source  and  the  beginning  of  its  reproduction  by  the  decoder  ) . 

With  larger  delays,  speakers  "collide"  and  must  resort  to 
explicit  phrases  to  separate  "messages". 

In  the  single-link  TASI  system  first  described,  all  loss 
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occurs  in  some  initial  segment  of  a  talkspurt.  This  results  in 
clipping  that,  apparently,  is  not  noticed  if  the  average  loss 
is  below  .5%.  (See  Weinstein  [1].)  We  will  see  that  with 
’’network  TASI”,  loss  can  occur  at  various  places  in  a  talkspurt 
and  speakers  might  tolerate  larger  average  loss  provided  it  is 
scattered.  Although  this  is  an  "experimental  question",  we 
note  that  if  the  "loss  vs  maximum  TASI  advantage"  curve  is 
relatively  flat,  small  increases  in  acceptable  loss  result  in 
relatively  large  increases  in  the  maximum  TASI  advantage.  The 
single  link  "loss  vs  TASI  advantage"  curve  will  be  discussed 
in  I.D. 

We  now  discuss  the  implementations  of  three  "standard" 
switching  disciplines  within  the  TDM  architecture  and  their 
uses  with  voice  and  data  sources. 

1.  Circuit  Switching  -  Conceptually,  a  circuit  is  a 
guarantee  of  a  path  of  specified  capacity  from  the  source  to 
the  destination  for  the  entire  session.  The  network  attempts 
to  establish  or  set  up  a  path  when  the  user  arrives.  If  it 
cannot  do  so  within  "reasonable"  time,  the  user  is  rejected. 
Therefore,  circuit -switcher  networks  are  designed  to  meet  a 
rejection  probability  requirement. 

Historically,  circuits  were  implemented  using  analog 
transmission  and  "hardwired"  connections  at  switches.  In  the 
TDM  architecture,  a  "unit"  capacity  circuit  is  a  guarantee  of 
a  slot  in  every  frame  at  every  node  along  some  path  between  the 
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source  and  destination.  At  set-up  time,  the  nodes  establish 
tables  with  entries  of  the  form  "slot  i  on  incoming  link  x 
corresponds  to  slot  j  on  departing  link  y".  The  particular 
choices  for  i  and  j  are  unimportant,  but  they  must  remain 
fixed  while  that  path  serves  the  circuit.  Note  that  no  address¬ 
ing  is  required  because  the  correspondence  between  circuit  and 
path  implicitly  identifies  the  source  and  destination.  (Inter¬ 
mediate  nodes  need  not  even  know  who  the  end  users  are.) 

With  analog  circuits,  the  end-to-end  delay  is  a  sum  of  the 
link  delays,  and  the  delay  is  the  same  for  each  increment  of 
the  input  signal.  With  TDM  circuits,  there  can  be  buffering 
delay  because  a  block  must  wait  for  the  appropriate  departing 
slot  at  each  node.  The  exact  value  of  this  delay  at  a 
particular  node  depends  on  the  particular  incoming  and  departing 
slots  serving  the  circuit  and  the  relative  alignments  of  frame 
boundaries.  But  because  the  next  occurrence  of  the  appropriate 
slot  must  be  within  t  secs  after  the  arrival  of  a  block,  this 
delay  is  at  most  x  at  any  node.  Further,  this  delay  is  the 
same  for  all  blocks  since  the  slot  correspondences  are  fixed. 

As  link  delays  are  also  constant  (with  time) ,  it  follows 
that  all  blocks  incur  the  same  end-to-end  delay.  In  this 
sense,  circuit  switching  offers  synchronous  service. 

If  a  source  is  circuit-switched,  the  host  node  looks  in 
the  source  buffer  every  x  sec  and  removes  a  block,  if  at 
least  b  bits  are  present.  Otherwise,  the  slot  remains 
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"empty".  Notice  that  the  host  node  must  transmit  "something” 
during  empty  slots.  That  is,  the  bits  that  are  in  the  slot 
will  eventually  reach  the  destination  node  unless  the  host 
indicates  otherwise.  (If  the  decoder  is  also  operating  synchro¬ 
nously  -  i.e.  expecting  a  new  block  every  frame  -  the  "idle 
message"  must  also  reach  the  decoder.  If  it  is  operating  asyn¬ 
chronously  -i.e.  it  "wakes  up"  only  when  the  destination  node 
indicates  a  new  block  has  arrived  -  the  idle  message  must  only 
reach  th*  destination  node) .  In  a  simple  approach,  the  network 
can  reserve  one  of  the  possible  2^  bit  patterns  to  indicate 
an  empty  slot.  However,  if  idles  occur  frequently,  this  is  an 
inefficient  source  code  (in  an  information  theoretic  sense) . 

One  example  of  a  potentially  more  efficient  strategy  is  the 
following.  The  first  bit  of  a  slot  is  used  as  a  flag.  When 
an  idle  occurs,  the  host  node  sets  the  flag,  and  it  can  then 
fill  the  remaining  bits  with  other  information.  Of  course, 
some  identifier  or  address  must  be  provided  for  the  new  inform¬ 
ation.  This  procedure  is  repeated  at  successive  nodes  on  the 
path.  Notice  that  it  is  really  a  "node  to  node"  strategy. 
Different  nodes  can  use  the  empty  slot  in  different  ways  as 
long  as  the  idle  flag  reaches  the  destination.  If  some  node 
actually  has  nothing  else  to  send,  this  idle  must  also  be 
indicated,  but  it  is  of  concern  only  to  the  next  node  on  the 
path.  The  flag  only  encodes  source  idles.  The  cost  of  this 
strategy  is  1  bit  per  slot,  and  the  average  gain  is  ab*. 
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where  a  is  the  average  idle  fraction  and  b^-  is  the  number  of 
bits  per  slot  available  for  user  information  (i.e.  aside  from 
the  bits  required  for  protocol) . 

Circuit  switching  can  serve  either  type  of  user.  The  bits 
of  a  data  user  do  arrive  in  correct  order  at  the  destination 
though  some  protocol  must  be  used  to  separate  messages  (We  do 
not  discuss  this-)  With  a  voice  source,  the  decoder  receives 
either  a  new  talkspurt  block  or  an  idle  every  frame.  In  the 
latter  case,  it  presumably  reproduces  a  silence.  Because  the 
service  is  synchronous,  the  decoder  does  not  need  to  use  the 
silence  information  included  in  the  first  block  of  each  talk- 
spurt  . 

2.  Store  and  Forward  Switching  -  Conceptually,  the  user 
only  has  a  ''promise"  of  future  delivery.  Capacity  is  allocated 
on  a  link  by  link  basis,  and  the  information  can,  in  principle, 
be  indefinitely  buffered  at  any  node. 

This  definition  obviously  leaves  many  things  unspecified. 
For  example  -  should  messages  be  broken  into  packets;  how  large 
should  packets  be;  how  should  routes  be  chosen?  A  complete 
discussion  of  these  questions  is  not  appropriate  here.  For 
us,  the  important  feature  of  store  and  forward  switching  is 
that  different  parts  of  a  message  can  incur  different  delays. 

To  focus  on  this,  we  consider  a  version  of  packet  switching 
in  which  each  packet  occupies  one  block.  That  is,  the  source 
output  is  segmented  into  blocks, and  each  block  or  packet 
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traverses  the  network  on  a  store  and  forward  basis.  (We  do  assume 
that  blocks  arrive  in  correct  order.)  Hov/ever,  we  consider  this 
version  of  packet  switching  with  the  following  "caveat"  in  mind. 

Caveat :  With  the  TDM  architecture,  one  must  be  careful  to 

distinguish  between  the  switching  discipline  .and  transmission 
format.  The  slotted  frame  structure  is  not  needed  for  "packet" 
switching.  To  the  contrary,  it  is  only  used  to  maintain  the 
synchronous  service  of  circuit  switching.  The  bits  in  the  slots 
that  are  not  reserved  for  circuits  effectively  constitute  a 
separate  "virtual"  binary  channel,  which  is  interrupted  when 
reserved  slots  occur.  In  principle,  the  nodes- can  use  this 
channel  to  implement  a  variety  of  store  and  forward  schemes. 

We  have  assumed  that  the  store  and  forward  switched  traffic 
respects  the  slot  structure  only  for  convenience.  Note  that 
one  "cost  "  of  this  is  that  every  idle  is  at  least  one  slot 
long,  i.e.  packet  transmission  cannot  begin  in  the  middle  of 
a  slot. 

It  is  evident  that  the  delay  variability  of  packet 
switching  is  not  a  problem  for  data  users  since  the  "message" 
is  in  the  bits  themselves.  Thus,  the  relevant  tradeoff  is 
between  line  utilization  (or  scheduling  flexibility)  and 
overhead  (addressing).  A  bursty  user  wastes  part  of  the 
capacity  of  a  circuit,  but  packet  switched  blocks  require 
addresses  because  their  transmission  is  not  "prescheduled". 

Notice  that  even  though  the  empty  slots  of  a  circuit  can  be 
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encoded  with  a  1  bit  flag(thereby  releasing  the  remaining  bits) 
the  network  has  no  control  over  the  occurrence  of  these  slots. 

That  is,  it  does  not  have  the  scheduling  flexibility  of  packet 
switching. 

Delay  variability  does  cause  problems  for  voice.  Once  the 
decoder  begins  reproduction  of  a  talkspurt,  it  looks  for  a  new 
block  every  . frame.  If  a  block  is  late,  the  decoder  might  try 
several  things,  e.g.  "stretch1’  the  previous  block.  For  simpli¬ 
city,  we  assume  that  a  late  block  is  unusable  and  results  in 
speech  loss.  Now  consider  a  talkspurt  that  lasts  M  frames,  and 
suppose  the  blocks  incur  delays  d^.-.c^.  If  the  decoder  begins 
reproduction  when  the  first  block  arrives,  then  block  i  is 
lost  if  di  >  d1.  With  this  approach,  substantial  loss  might  occur 
if  the  first  block  is  "lucky",  i.e.  if  d^^  is  much  less  than  the 
average  delay.  Therefore,  the  decoder  might  want  to  deliberately 
postpone  reproduction  for  a  time  d  so  that  subsequent  blocks 
have  more  time  to  arrive.  Thus,  there  is  an  end-to-end 
delay  vs.  loss  tradeoff.  (A  practical  problem  with  this  scheme 
is  that  the  decoder  generally  cannot  know  d^.  Since  it  is 
the  total  delay,  i.e.  d^  +  d,  that  matters  to  the  speaker, 
choosing  d  is  difficult.  If  the  decoder  does  have  some 
knowledge  of  the  delay  distribution,  it  might  be  able  to  make 
a  "reasonable"  guess,  e.g.  choose  d  s.t.  d+d1  <  250  ms  with 
high  probability.) 

Delay  variability  can  also  cause  distortions  in  the 
durations  of  silences  since  the  initial  blocks  of  successive 
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talkspurts  need  not  incur  the  same  delay.  When  the  first  block 
of  a  talkspurt  arrives,  the  decoder  knows  what  the  duration  of 
the  preceding  silence  is  supposed  to  be  since  this  information 
is  contained  in  the  block.  Presumably,  the  decoder  also  knows 
when  it  last  reproduced  a  talkspurt  block.  Thus,  if  the  first 
block  of  a  talkspurt  arrives  before  it  is  needed  (i.e.  before 
the  appropriate  length  silence  has  occurred)  then  the  decoder 
can  postpone  reproduction  in  order  to  recreate  the  appropriate 
silence.  (It  might  want  to  add  even  more  delay  to  give  subsequent 
blocks  of  the  new  talkspurt  more  time  to  arrive,  as  we  have 
discussed. ) 

In  summary: 

•  Data  can  use  either  packet  or  circuit  switching.  With 
either  discipline,  all  bits  do  arrive  correctly  and  in 
order.  Some  message  separation  protocol  is  required. 

The  basic  tradeoff  is  between  line  utilization  and 
overhead  (addressing) . 

•  Voice  can  use  circuit  switching,  and  end-to-end  delay 
is  the  only  "distortion"  relative  to  face-to-face 
conversation.  Packet  switching,  or  what  we  have  termed 
"Network  TASI",  is  also  possible.  In  this  case,  the 
network  is  free  to  discard  or  delay  blocks  in  any  way 

as  long  as  the  end-to-end  delay  is  less  than  about  250  ms, 
and  the  less  (due  to  outright  discard  at  intermediate 


nodes  or  delay  variability  at  the  decoder)  is  acceptable. 


3.  Fast  Circuit  Switching  -  In  this  case,  the  user  is  provided 
a  circuit  only  during  "active  periods".  The  terras "fast"  and 
"active  period"  are  obviously  contextual.  To  discuss  some  trade¬ 
offs,  we  adopt  the  following  conventions: 

1)  The  "session"  is  defined  by  the  user,  i.e.  it  is  the  time  that 
the  user  wants  access  to  the  source  buffer. 

2)  The  source  can  place  at  most  one  block  into  the  buffer  in 
any  frame. 

3)  An  active  period  is  a  sequence  of  (consecutive)  frames  in 
which  the  source  does  enter  a  block. 

4)  The  duty  factor  is  the  percentage  of  frames  during  which  the 
source  is  active. 

5)  A  session  contains  "many"  active  periods. 

For  voice,  active  periods  and  talkspurts  coincide.  For  data, 
we  have  adopted  the  view  that  the  source  "meters"  out  a 
message  at  a  rate  of  one  block/frame .  The  host  node  initiates 
a  circuit  acquisition  at  the  beginning  of  each  active  period. 

We  have  seen  that  the  choice  between  circuit  and  packet 
switching  is  based  on  a  tradeoff  between  utilization  and 
overhead.  Fast  circuit  switching  is  another  way  of  increasing 
the  utilization  for  bursty  (i.e.  low  duty  factor)  sources. 


From  the  user's  viewpoint,  fast  circuit  switching  is  acceptable 
as  long  as  the  network  can  set  up  the  circuit  and  deliver  the 
first  block  of  the  active  period  within  an  acceptable  time. 

(We  asstime  that  blocks  are  buffered  until  the  circuit  is  set-up.) 
Notice  that  fast  circuit  switching  does  not  pose  a  delay- 
variability  problem  for  talkspurt  reproduction  because  once 
the  first  talkspurt  block  arrives,  subsequent  blocks  arrive  at 
a  steady  rate  of  one  per  frame.  However,  it  can  cause  silence 
distortions  because  the  circuit  set-up  time  can  differ  for 
successive  talkspuTts.  If  both  packet  and  fast  circuit  switching 
can  provide  acceptable  service  to  a  bursty  user,  the  choice  for 
the  network  depends  on  the  relative  overhead  costs  of  packet 
and  fast  circuit  switching  and  the  "nature  of  the  bustiness". 

To  understand  this,  we  first  need  to  examine  the  costs,  of  circuit 
set-up. 

Circuit  set-up  algorithms  build  paths  on  a  link  by  link 
basis.  The  capacity  reserved  on  a  partially  completed  path  is 
not  available  for  other  circuits  while  the  algorithm  tries  to 
extend  the  path.  In  practice,  the  algorithm  might  backtrack 
if  it  cannot  extend  some  partial  path.  However,  even  if  some 
link  does  not  end  up  in  the  final  path,  it  is  not  released 
until  the  backtracking  actually  reaches  it.  For  attempt  rates 
below  some  "critical  range",  most  circuit  requests  are 
successfully  completed,  and  the  throughput  increases  as  the 
attempt  rate  increases.  (Throughput  is  number  of  ccess/unit 
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time.)  Beyond  this  critical  range,  the  network's  resources 
are  increasingly  consumed  by  partially  completed  paths, and 
the  probability  of  successful  completion  decreases.  In  this 
region,  throughput  decreases  as  the  attempt  rate  increases. 
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Now  as  a  measure  of  "burstiness",  the  duty  factor  is 
simply  the  long-term,  average  activity  fraction,  and  a  given 
duty  factor  can  be  obtained  by  many  combinations  of  activity 
frequency  and  average  duration  of  active  periods.  From  the 
previous  discussion,  it  should  be  clear  that,  for  fixed  duty 
factor,  the  overhead  of  fast  circuit  switching  decreases  rela 
tive  to  the  overhead  of  packet  switching,  as  the  "burstiness" 
tends  to  the  "infrequent  but  long  active  period"  type.  For 
example,  a  user  wishing  to  make  several  large  file  transfers 
in  a  session  might  be  more  efficiently  served  by  fast  circuit 
switching.  (Of  course,  as  the  duty  factor  itself  increases, 
"ordinary"  circuit  switching,  i.e.  providing  a  circuit  for 
the  entire  session,  becomes  relatively  more  attractive  than 
either  packet,  or  "fast  circuit"  switching.) 


Variable  Rate  Speech  Coding 

So  far,  we  have  assumed  that  the  decoder  uses  all  b  bits 
of  a  block  to  reproduce  a  frame  of  speech.  With  variable  rate 
coding  [3] ,  speech  is  digitized  so  that  a  full  block  contains 
information  to  reproduce  speech  at  several  possible  levels  of 
fidelity.  That  is,  a  block  actually  contains,  say,  b/n  sub¬ 
blocks,  the  slot  size  is  reduced  by  n  ,  and  speech  quality 
increases  as  the  number  of  sub -blocks  used  in  reproduction  in¬ 
creases.  This  type  of  coding  might  be  useful  in  congestion 
control  because  by  sending  fewer  sub-blocks,  the  network  can 
reduce  delay  but  maintain  the  ’’continuity"  of  the  conversation 
at  a  lower  quality.  Presumably,  this  is  preferable  to  total 
loss  of  blocks  or  excessive  delays. 

Speech  Digitization  Techniques 

m 

1)  Standard  "toll  quality"  speech  uses  64  Kbps  PCM 
with  a  sampling  rate  of  8000  Hz  and  8  bit 
quantization. 

2)  Differential  PCM  -  Transmit  the  difference  between 
successive  samples.  We  have  seen  references  to 

16  Kbps  and  32  Kbps  DPCM  system. 

3)  Linear  Predictive  Coding  (LPC)  -  In  this  approach 


the  vocal  tract  output  during  a  frame  is  modelled  as 
the  output  of  a  linear  time  invariant  system  of  some 
order  k.The  encoder  examines  the  speech  during  the 


frame  and  uses  its  observations  to  generate  values 
for  the  k  system  coefficients  that  result  in  a 
"best  fit"  of  the  observed  samples  to  the  model. 

These  values  and  various  other  parameters  are  then 
quantized  and  transmitted.  Using  LPC,  it  is  possible 
to  achieve  intelligible  speech  with  transmission  rates 
as  low  as  1  -  2  Kbps. 

Noisy  Links 

Channel  noise  is  combatted  by  error  correcting  coding 
or  error  detection  and  retransmission.  The  latter  is  particularly 
troublesome  for  voice  traffic  because  it  increases  both  delay 
variability  and  average  end-to-end  delay.  Thus  voice  traffic 
must  usually  accept  the  noise  immunity  provided  by  error  correction. 
This  is  an  important  consideration  in  the  choice  of  the  digiti¬ 
zation  technique.  (64  Kbps  PCM  is  relatively  insensitive  to 
errors;  LPC  speech  with  1-2  Kbps  rates  is  sensitive  to  errors*) 

Choice  of  _r  and  b  and  Circuit  Delay 

Recall  that  TDM  circuit  switching  introduces  buffering 
delays  because  a  block  must  wait  for  its  slot  on  the  departing 
link.  This  delay  at  a  particular  node  depends  on  the  choice 
of  slots  serving  the  circuit,  but  for  a  given  choice,  it  is 
proportional  to  t  and  b.  For  example,  one  can  provide  an 
equivalent  capacity  circuit  by  using  a  block  size  of  b/2 


and  a  frame  size  of  t/2.  For  a  given  block  of  b  bits  at 


the  source,  this  circuit  will  impose  only  half  the  buffering 
delay  on  the  leading  edge  of  the  block. 


depcMrlinj 

arriving 

fm^cs 


In  the  diagram,  we  have  shown  two  arriving  frames  and  the 
corresponding  departing  frames.  (The  departing  link  has  a  lower 
raw  bit  capacity  so  slots  are  wider,  but  all  slots  contain 
b  bits.) 


Here  the  frame  time  and  block  size  are  halved,  and  the 
buffering  delay  of  the  leading  edge  is  also  halved.  Essentially, 
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this  is  a  different  interleaving  of  the  b  bits  in  the  given 
source  block.  This  can  reduce  the  overall  end-to-end  delay 
as  the  following  example  shows. 

We  are  concerned  with  the  time  between  the  beginning  of  a 
talkspurt  at  the  source  and  the  beginning  of  its  reproduction 
at  the  destination.  With  PCM  speech,  the  encoder  can  produce 
digitized  speech  on  a  sample  by  sample  basis,  but  if  activity 
detection  is  desired,  it  cannot  release  the  first  sample  until 
it  has  looked  at  the  output  for  some  more  time.  The  minimum 
time  needed  for  this  depends  on  the  sensitivity  of  the  tracking 
algorithm,  but  it  is  generally  greater  than  the  time  between 
samples  (i.e.  >  gQjp  ■  .125  ms).  Thus  the  activity  detection 
process  requires  a  frame  structure  at  the  encoder  with  some 
minimum  frame  time  x£.  (This  adds  a  delay  t£  .)  Suppose 
that  within  this  time  the  encoder  produces  bg  bits.  Now  the 
PCM  decoder  does  not  need  all  bg  bits  to  begin  reproduction. 

It  can  essentially  work  on  a  sample  by  sample  basis,  i.e.  its 
minimum  block  size  is  8  bits.  Thus,  if  the  network  transmits 
the  encoder's  block  using  a  network  frame  time  of  tg/2  and 
block  size  bg/2,  the  first  sample  of  the  talkspurt  incurs 
less  buffering  delay  in  traversing  a  given  circuit.  (Of 
course,  the  other  components  of  circuit  delay,  namely  the  link 
propagation  delays,  are  not  reduced  by  this.) 


c. 


A  Speaker  Activity  Model 

In  [4] ,  Brady  introduced  a  continuous  time  Markov  chain 
model  of  the  activity  of  a  single  speaker  A  engaged  in  a 
conversation  with  another  speaker  B.  The  four  possible 
combinations  of  activity  are  denoted  by  TT,  TS,  ST,  SS; 
where  T  *  talk,  S  *  silence,  and  the  first  letter  in  a  pair 
refers  to  A's  state.  His  model  has  the  following  state  diagram. 


Note  that  the  combinations  TT  and  SS  are  split  to  introduce 
more  memory.  For  example  if  TT  is  entered  from  TS,  one  might 
guess  that  a  return  to  TS  is  more  likely  than  a  passage  to  ST. 
Presumably,  A  is  dominant  and  B  briefly  interrupts . )Hence , 
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one  would  guess  >  ^4 

Our  objective  is  to  model  the  activity  of  N  independent 
speakers  at  one  end  of  a  set  of  N  independent  conversations 
between  two  sites.  At  this  point,  we  could  extract  the  marginal 
behavior  of  A,  say,  and  extrapolate  to  N  speakers.  (By  indepen¬ 
dence,  the  joint  process  would  be  a  product  process.)  This  leads 
to  considerable  complications.  For  example,  A  is  in  talkspurt 
whenever  the  chain  is  in  states  1,  2,  or  3.  The  time  that  a 
Markov  chain  "sojourns"  in  a  subset  of  its  states  is  generally 
not  easy  to  characterize  in  a  "closed  form".  The  following  model 
has  been  proposed  by  Weinstein  [1],  and  we  adopt  it.  Each 
speaker  is  modelled  by  a  two  state  chain. 


A 


SilW  T«*»£ 


From  the  independence  of  the  speakers  and  properties  of  Markov 
chains,  it  follows  that  the  process  A(t)  «  number  active  at 
time  t  is  characterized  by  a  birth-death  chain. 
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Frora  this,  one  can  see  that  the  ergodic  distribution  of  A  is 
binominal  on  N  trials  with  "Pr  (success)"  *  .  So  the 

mean  activity  level  is  A  «  N  yy—  ,  and  the  standard  deviation 


As  representations  of  the  capacity  demand  of  blocked, 

digitized  speech,  these  models  have  the  obvious  defect  of 

using  continuous  time.  But  given  that  the  frame  time  t  is 

much  smaller  than  most  active  periods  and  silences,  this  is 

not  a  serious  problem.  (We  could  use  discrete  time  with 

geometric  distributions  and  obtain  similar  results.)  A  more 

substantial  question  is  whether  talkspurts  and  silences  can 

be  approximated  by  exponential  distributions.  (We  adopt  the 

notation  x~F(*)  to  mean  that  F  is  the  distribution  function 

of  x.  The  notation  exp (A)  refers  to  the  exponential  distri- 

•  Xx 

bution  with  parameter  X,  i.e.  F(x)  *  1  -e  .  The  mean  is 
then  y  .)  Brady  indicates  that  the  exponential  distribution 
is  reasonably  good  for  talkspurts  and  suggests  this  is  because 
most  talkspurts  are  what  he  terms  "solitary  talkspurts",  i.e. 
those  that  begin  in  TS  and  end  in  SS  without  any  passage 
through  TT.  Now  for  the  Brady  chain,  given  that  a  sojourn 
in  state  1  ends  with  a  transition  directly  to  5,  the  duration 
of  time  spent  in  state  1  '**exp  (v15  +  v^)  •  (For  any  finite 
Markov  chain,  the  aposteriori  distribution  of  the  exit  time 
from  a  state  i,  given  that  the  transition  was  to  j ,  is  in¬ 
dependent  of  j  and  «-»exp  (  Z  v..).)  of  course,  the  fact  that 

j  J 

in 
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most  talkspurts  in  the  Brady  model  are  exponentially  distributed 
does  not  by  itself  imply  that  real  talkspurts  can  be  so  approxi¬ 
mated.  The  latter  assertion  must  be  empirically  verified.  The 
only  point  is  that  if  the  Brady  model  is  accurate  and  most  talk¬ 
spurts  are  solitary,  then  the  exponential  approximation  will 
also  be  good. 

Brady  also  indicates  that  silences  are  not  well  modelled 
by  an  exponential  distribution  and  suggests  that  this  is  because 
there  are  really  two  types  of  silences  --  long  silence  occuring 
when  a  party  is  listening  and  shorter  silences  punctuating  the 
speech  of  a  dominant  talker.  But  we  are  really  interested  only 
in  the  behavior  of  the  aggregate  process  A(t)  and  not  in  indivi¬ 
dual  speakers.  The  following  "plausibility  arguments"  indicate 
that  the  birth-death  model  for  A(t)  is  reasonable  to  use  in 
the  case  of  "large  N",  even  if  the  2 -state  single  speaker 
Markov  model  is  not  good. 

First  consider  the  following,  more  refined,  single 
speaker  model. 


Here  there  are  two  silent  states  and  S2  and,  to  attempt 
to  capture  the  desired  behavior,  one  might  assume  . 


-29- 


The  relative  equilibrium  probabilities  (i.e.  unnormalized)  are 
P (T)  *  X2,  P (S1)  »  X2  ,  P(S2)  *  ^  U2*  The  aggregate 

vector  process  for  N  speakers,  [T(t),  ( t) ] »  where  T(t)  is 

the  number  in  T  and  S^t)  is  the  number  in  is  a  Markov 
process.  When  T  ■  i  and  S1  *  j ,  T  is  being  driven  to  i  -  1 
by  a  Poisson  process  of  rate  i  (u^  +  u2)  and  to  i  +  1  by  a 
Poisson  process  of  rate  A^  +  (N-i-j^)A2>  Now  we  make  two 
assumptions  for  large  N. 


1)  T  rarely  becomes  very  large  (large  is  for  example  N  -  0(1) 
or  N  -  00/H))  so  that  N  -  T  is  also  large,  i.e.  T  and  N-T 
are  both  0(N). 


2) 


Given  that  N  -  T  is  large,  the  relative  populations  in  Sj 
and  S2  can  be  replaced  by  their  equilibrium  relative 


mean  values,  i.e. 


> 


(N  -  T) 


pi 


P»(S7) 


Sx)  +  p(S2) 


With  these  assumptions,  we  are  asserting  that  the  (non-Markov) 
marginal  process  T(t)  can  be  approximated  by  a  birth-death 
process  of  the  type  used  for  A(t)  .  That  is  when  T  *  i,  it 
is  driven  to  i  -  1  by  a  Poisson  process  of  rate  i ( u ^  +  u7) 
and  to  i  +  1  by  a  Poisson  process  of  rate 
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(N  -  i) 


PCS,) 


P(S2) 

l2  pT^PpT^T 


Alternatively 


ki  pTSp+pfSp 

we  are  asserting,  that  the  replacement  of  the  three  state  model 
by  a  two  state  model  (for  a  single  speaker)  with 


U1  *  w2! 


kl  PCsi)  +  ^2  PtS2) 
p(^1)  +  P(S2) 


is  valid  in  the  aggregate. 


As  another  approach,  we  might  attempt  to  extrapolate  a 

limit  theorem  from  renewal  theory  to  alternating  renewal  processes. 

This  theorem  is  as  follows  (See  Feller  [5]  p.  370.)  Consider 

the  process  obtained  by  "merging  N  independent  renewal  processes, 

i.e.  look  at  the  collective  sequence  of  renewal  epochs. 

Suppose  the  mean  renewal  time  for  the  ktl1  process  is  —  and 

uK 

that  the  individual  processes  are  "rare",  i.e.  no  single  process 
contributes  greatly  to  the  merged  process.  Set  a  *  +  ...y^. 

Then  in  the  "steady  state",  the  waiting  time  for  the  next 
event  in  the  merged  process  (which  is  not  a  renewal  process 
in  general)  is  approximately  distributed  as  exp (a). 

Now  consider  N  identical,  alternating  renewal  processes; 
say  each  process  has  two  ..states  0  and  1.  Suppose  the  renewal 
time  in  state  0  has  distribution  function  FQ  with  mean  y  , 


and  for  state  1,  these  are  Fj^  and  —  respectively.  Let  M(t) 
be  the  merged  process,  M(t)  ■  number  in  state  1.  We  would 
like  to  assert  that  M(t)  can  be  approximated  by  the  birth- 
death  model.  More  precisely,  we  would  like  to  show  that 
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when  M  *  i,  the  time  until  one  of  the  i  processes  in  state  1 
goes  to  0  is  approximately  —  exp(iy),  and  analogously  for  the  N-i 
processes  in  state  0.  (Note  that  this  is  not  the  assertion 
that  the  time  until  the  next  l-*-0  transition  is  «*exp(iy) .  The 
next  l-*-0  transition,  i.e.  the  next  time  M  decreases,  may  result 
from  one  of  the  (N-i)  processes  in  0  going  to  1  and  back  to  0.) 
TKe  derivation  of  the  limit  theorem  rests  on  the  fact  that,  in 
steady  state,  the  waiting  time  for  the  k  renewal  process  has 
its  so  called  residual  lifetime  distribution.  (This  distribution 
is  defined  as  follows.  Suppose  one  examines  a  renewal  process 
at  some  time  t  and  asks  for  the  distribution  of  time  Yt  until 
the  next  renewal.  Then  one  can  show  that  as  t-*~,  the  density 

of  Yt  approaches  1  where  G(y)  is  the  renewal  distri- 

yd  G(y )-/**  (l-G(y)  )dy.) 

J  0 

To  extend  this  limit  theorem  we  would  have  to  show  that  in 
the  steady  state  and  given  M  *  i,  the  l-*-0  waiting  times 
for  the  i  processes  in  state  0  are  distributed  as  the  resi¬ 
dual  lifetime  distribution  associated  with  F^(x) (and  analo¬ 
gously  for  the  i  processes  in  state  0).  Of  course,  once  a 
transition  occurs,  say  a  l-*0  occurs  first,  the  particular 
process  that  changed  joins  the  other  N-i  processes  in  state  0, 
and  its  renewal  time  distribution  is  just  the  original  Fq(x) 
rather  than  the  residual  lifetime  distribution.  However,  for 
those  i  such  that  i  and  N-i  are  both  large,  its  effect  on 
the  total  might  be  negligible.  That  is, we  might  always  be 


bution  and  m- is  its  mean, 


m 


able  to  treat  the  processes  as  if  their  renewal  times  are 
distributed  as  the  respective  residual  lifetimes. 


Note  that  if  we  count  states  and  S2  as  a  superstate  in 
the  three  state  single  speaker  model,  the  resultant  two  state 


processes  is  an  alternating  renewal  process  with  one  renewal 

[U1  u2 

■ ■  expCA^  +  -  ■—  exp(X2) 

12  12 

Suppose  that  state  T  is  state  1,  and  the  silent  superstate  is 
state  0.  If  the  generalization  of  the  limit  theorem  is  correct, 
then  when  M  *  i,  the  distribution  of  time  until  the  next  tran¬ 
sition  of  one  of  the  (N-i)  silent  speakers  can  be  approximated 


by  exp (a)  where  a  *  (N-i) 


ul+v2 


( Y~ 
A1 


u2  ' 

T'^ 

A2 


-1 


CN.-il 


p(S1)+p(S2 


-  I" X ^  P(S^)  +  X2  P(S2)j  •  This  is  the 

)  L 


same  as 


the  previous  formula.  That  is,  we  can  view  the  initial  approxi¬ 
mation  made  by  replacing  the  number  in  and  S2  by  their 
respective  means  (given  that  the  number  in  T  *  i)  as  a  special 
case  of  this  limit  theorem.  (Since  F1  is  exp  (u^  +  u2) ,  the 
distribution  exp(i(u1  +  u2))  is  exact  for  the  speakers  in  T. 

This  is  because  the  residual  lifetime  distribution  associated 
with  an  exponential  distribution  is  the  same  exponential 
distribution . ) 

Weinstein  [1]  does  indicate  that  for  N  >  25-30,  the  birth- 
death  model  for  aggregate  activity  appears  to  be  as  good  as 
the  Brady  model,  in  the  sense  that  if  one  compares  simulations 


i 
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of  the  birth-death  model  on  N  speakers  and  N  independent, 
marginal  Brady  speakers,  the  empirical  distributions  are  simi¬ 
lar.  (By  N  independent,  marginal  Brady  speakers  we  mean  run 
simulations  of  N  Brady  models  and  examine  the  process  M(t)  * 
number  of  "A  speakers"  in  talkspurt  at  t  •) 
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D.  The  Loss  Fraction 

For  the  birth-death  model  of  A(t) ,  the  transition  proba¬ 
bility  Pr[A(t)  »  £ | A  CO)  *  i]  approaches  a  limit  p^  that  is 
independent  of  i.  This  p^  is  called  the  equilibrium  or 
ergodic  probability  of  state  l  (See  Chapter  II)  and  is 
given  by  (^)  (■£—) z  (-£^j)N"£  •  It  admits  an  interpretation 

as  the  limiting  fraction  of  time  the  chain  spends  in  state  l. 
This  expression  actually  applies  to  more  general  models  of 
speaker  activity.  For  example,  if  each  speaker  is  modelled 
as  a  two-state  alternating  renewal  process  with  mean  silence 
j  and  mean  talkspurt  ^  ,  and  if  the  notion  of  an  equilibrium 
is  well  defined  for  this  process,  then  the  binominal  distri¬ 
bution  for  A(®)  follows  from  independence.  See  Weistein  [ 1 ] . 
Once  expressions  for  the  {p^}  are  obtained,  one  can  substitute 
into  the  formula  . 


N 


u-  cw)p, 


•V1 


N 

E  ip 
£«  1 


l 


to  obtain  the  average  loss. 

Although  this  formula  only  depends  on  the  means  of  the 
silence  and  talkspurt  distributions,  the  actual  manner  in 
which  loss  occurs,  e.g.  1*  of  each  talkspurt  vs.  1  of  every 
100  talkspurts  in  its  entirety,  depends  on  the  complete  distri- 
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butions.  To  see  this,  consider  the  following  examples.  Suppose 
N  ■  2,  C  ■  1,  and  the  activity  of  each  speaker  is  modelled  as 
an  alternating  renewal  process  with  silence  «*exp(A)  and  a 
talkspurt  that  is  a  mixture  of  two  deterministic  times  x^  and 
t 2* There  are  two  cases: 

*  Tr  >>  X“  >  t2,  where  the  mixing  probability, a  is 
chosen  so  that  ox^  +  (1  -a)x2  *  j  ,  i.e.  the  long  talkspurt 
occurs  rarely.  Then  it  is  evident  that  one  speaker  can 
occassionally  lose  several  entire  talkspurts  while  the  other 
ties  up  the  circuit  with  a  long  talkspurt. 

.  ■  x2  *  1/X.  In  this  case  a  speaker  will  never  lose 

an  entire  talkspurt  since  the  event  of  simultaneous  completion 
of  silences  has  zero  probability.  However,  the  mean  silence 
and  talkspurt  times  are  the  same  in  both  cases  so  the  average 
losses  are  the  same. 

From  the  previous  expression  for  <P  we  can  obtain  two 
simple  bounds 


<P  < 


N  prCA  >  Cv  *  1) 

— — 


iL-^  Pr(A  >  Cv  *  1) 
X  r  -  v 


0  > 


a  -  c ) 

min  {0,  - 

X 


Pr  (A  >  A  ♦  1)  } 


From  the  properties  of  the  binominal  distribution,  we  know 
that  if  N  and  Cy  approach  infinity  with  Cy  *  A(1  «-e)  ,  £  >  0, 
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then  Pr(A  Cy  +  1)  approaches  0.  (Essentially,  this  is  because 
the  standard  deviation  is  0 (%/fi)  ,  which  approaches  0  as  a  per¬ 
centage  of  the  mean,  "K  »  N  ,  as  N  ♦  «  . )  Since  this  works 

for  any  e  >  0,  it  follows  that  we  can  approach  the  TASI  advantage 
i  arbitrarily  closely  (in  percentage  terms)  with  no  loss,  as 

A  xi  u  +  X 

N  -*•  ».  Note  —  *  — -  ,  which  is  the  inverse  of  the  equili- 

brium  activity  fraction  for  one  speaker.  It  also  follows  from 


the  properties  of  the  binominal  distribution  that  as  N  +  «, 


Pr(A 

>  A  ♦  1) 

approaches  a 

positive  limit.  If 

we  choose 

C  * 

V 

X(1  -e). 

1  >  e  >  0,  then  the  lower  bound 

is 

A£ 

A 

this 

Pr  (A  >_  5  +  1)  *  e  Pr(A  >.  £  +  1)  .  By  the  previous  remark, 
remains  bounded  away  from  0  as  N-*  ®.  That  is,  if  we 

attempt  TASI  with  Cy  *  A(1  • 

*  e) ,  then  for  any  1 

>  e  >  0,  the 

loss 

is  bounded 

away  from  0 

in  the  limit.  For 

these  reasons , 

the 

ratio  »  -  1 
& 

+  —  is  called  the  maximum  TASI 
u 

advantage . 

The  following  table  is 

taken  from  Weinstein  [1].  In  this 

table,  Cy  *  36 

and  — * 
X+y 

.4 

N 

J  TASI  Advantage 

Utilization  *  A/Cy 

Loss 

60 

24 

1.66 

.66 

3.8  x  10*5 

70 

28 

1.95 

.77 

1.5  x  10'3 

75* 

30 

2.08 

.83 

.005* 

80 

32 

2.22 

.88 

.014 

85 

34 

2.36 

.94 

.029 

90 

36 

2.5 

1 

.05 

100 

40 

2.77 

1.11 

.11 

The 

recommended 

operating  point  is  N  *  75. 

Although  the  TASI  advantage  1  ♦  —  can  be  approached  in 
the  limit  with  no  loss,  for  any  finite  N  and  N  >  Cy  >  X,  the 
loss  is  still  nonzero.  That  is,  although  the  mean  activity 
level  is  A,  the  activity  process  does  exhibit  fluctuations 
above  5,  and  this  results  in  loss.  On  the  other  hand,  if 
infinite  buffering  of  speech  is  allowed,  then  for  finite  N  and 
Cy,  we  can  achieve  a  stable  queue  and  no  loss  with  any  Cy  >  J.. 
This  follows  from  the  queuing  theory  "  metaprinciple" 
that  stability  is  present  as  long  as  the  service  rate  exceeds 
the  average  arrival  rate  of  work.  If  the  buffer  is  finite, 
overflow  speech  is  lost,  but  the  loss  fraction  decreases  as 
the  buffer  size  increases.  (Of  course,  the  average  delay  also 
increases.)  Thus  there  is  a  loss  vs.  delay  vs.  TASI  advantage 
tradeoff.  A  formal  model  of  this  single- link,  buffered  TASI 


multiplexer  has  .been  developed  and  analyzed  by  Berger  [6]. 


E.  A  Voice/Data  Link  Model 

A  link  having  capacity  C  bps  is  shared  by  N  callers,  whose 
activity  is  modelled  by  the  birth-death  chain,  and  an  infinite 
data  buffer.  Data  arrives  in  messages .  The  message  arrival 
point  process  is  taken  as  a  renewal  with  mean  rate  n  messages/sec. 
Message  lengths  are  modelled  as  i.i.d.  random  variables  with 
mean  length  bits. 

Remarks 

•  Although  the  term  ’’bit"  is  used  for  a  "unit"  of  message 
length,  we  treat  these  lengths  as  continuous  variables. 

•  A  few  of  our  results  will  apply  to  general  arrival 
processes,  but,  in  the  main,  we  will  assume  Poisson 
arrivals  and  exponential  length  distributions.  For 
a  queue  fed  by  many  small,  independent  sources,  the 
assumption  of  Poisson  arrivals  is  reasonable  because 
of  the  limit  theorem  mentioned  in  I.C. 

For  now  we  assume  that  the  allocation  of  capacity  is 
given  and  depends  only  on  A(t).  That  is,  when  A(t)  *  i, 
the  data  backlog  is  decreasing  at  some  rate  r^  bps,*  and  we  are 
not  concerned  with  how  the  remaining  (C-ri)bps  is  used  to 
satisfy  the  spearkers.  In  a  later  chapter  we  will  discuss  the 
control  problem  --  how  should  the  capacity  be  divided  given 
some  cost  functions  associated  with  an  allocation  policy. 

Now,  our  main  concern  is  to  analyze  a  given  allocation.  With 
this  assumption,  a  sample  function  of  the  data  backlog  has 
the  following  general  shape. 
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8ac*l*3 


The  jumps  indicate  message  arrivals,  and  changes  in  slope 
reflect  changes  in  speaker  activity. 

In  terms  of  the  TDM  architecture,  the  model  is  a  limiting 
case  in  which  the  frame  time  and  block  size  are  very  small 
compared  to  speaker  talkspurt  and  silence  times,  i.e.  we 
ignore  the  discrete  nature  of  the  TDM  architecture.  As  indi¬ 
cated,  this  is  not  an  unrealistic  assumption. 

To  describe  the  message  arrival  process,  we  adopt  the 
standard  "A/B"  notation  from  Queuing  Theory.  That  is,  A  is 
the  message  interarrival  time  distribution^ and  B  is  the  message 
length  distribution.  M  stands  for  the  Markovian  or  expo¬ 
nential  distribution,  G  is  general  etc.  It  should  be  noted 
that,  although  message  lengths  are  i.i.d,  service  times,  i.e. 
actual  times  to  transmit  messages,  are  not.  The  dependencies 
enter  through  the  speaker  activity  process,  which  will  also 
be  called  the  phase  process . 


II.  Notation,  and  Background 


In  this  chapter,  a  brief  discussion  of  finite,  time-homo¬ 
geneous  Markov  chain  is  presented.  The  material  is  to  a  large 
extent  an  adaptation  of  Keilson  [7].  The  purpose  is  to  establish 
some  notation  and  a  body  of  "quotable  results". 

Vectors  are  denoted  by  single  underscoring,  and  matrices 
by  double  underscoring.  The  symbol  ^  means  "is  defined  as". 

If  x  is  a  vector,  the  associated  diagonal  matrix  xD  ^  i j  * 

xi5ij’  5ij  ^  Kronecker  delta.  The  vector  1  is  a  vector  of  all 
ones,  and  if  n  is  a  scalar,  n  ^  n  1*  The  identity  matrix, 

which  would  be  1  in  the  above  notation,  is  denoted  by  I  .  If 

-D  “ 

x  and  y  are  vectors  or  matrices  of  the  same  dimension,  the 
statements  x»y,  x^y,  x>y  mean  that  specified  relation 
holds  on  a  componentwise  basis.  If  A  >  0  is  a  matrix  then  A 
is  called  substochastic  if  A  1  <  1,  stochastic  if  A  1  *  1,  and 
strictly  substochastic  if  A  1  <  1. 
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A  is  called  primitive  if  for  some  integer  m  >_  1,  Am  >0. 
Associate  an  n-node  directed  graph  with  A  by  placing  an  arc  from 
i-+j  iff  A^j  >  0.  Then  one  can  show  that  A  is  irreducible  iff  the 
graph  is  strongly  connected,  and  A  is  primitive  iff  the  g.c.d. 
of  all  cycle  lengths  is  1,  and  A  is  irreducible. 

Theorem  PRF 

Let  A  >  0  be  irreducible.  Then  A  has  a  real  positive  eigen¬ 
value  r  with  the  following  properties. 

1)  If  A  is  any  other  eigenvalue  .then  j  A  j  <  r. 

2)  r  is  of  algebraic  and  geometric  multiplicity  one.  That  is  r  is 
a  simple  root  of  the  characteristic  polynomial  and  has  a  one 
dimensional  eigenspace.  The  associated  right  eigenvector 

xR  can  be  chosen  real  and  positive,  i.e.  xR  >  £.  r  is  the 
only  eigenvalue  having  such  an  eigenvector. 

3)  If  i  and  M  are  the  minimum  and  maximum  row  sums  then 

m  1  r  1  Strict  inequality  holds  in  both  cases  unless M  »  m. 

4)  If  there  are  h  eigenvalues  of  modulus  r  (counting  r) ,  then 
the  spectrum  of  A  is  mapped  into  and  onto  itself  by  a 
rotation  of  the  complex  plane  of  angle 

5)  h  «  1  iff  £  is  primitive,  r  is  called  the  spectral  radius 
of  A,  sp(A);  the  PRF  root;  or  PRF  eigenvalue. 

T 

Remark:  Since  A  is  irreducible  iff  A  is  (reverse  directions 

T 

of  arcs),  we  can  apply  PRF  to  A  as  well.  This  yields  possibly 


1 
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different  bounds  on  r  in  terms  of  column  sums  and  a  positive  left 
eigenvector  x^  for  A. 

In  the  primitive  case, the  behavior  of  Am  as  m  ♦  *  is 
determined  by  r,  xR  and  xL-  Let  J  ^  xR  x£  ,  where  xR  and  xL 

T  T 

are  normalized  so  that  1  •  1,  x^  xR  -  1.  Let  ft  *  A  -  r  J. 
Then  one  can  show 

1)  x^£=«0T,  £  xR  -  0,  J  £  ■  £  J  -  0  • 

2)  i2  ■  1>  l  A  -  A  J  -  r  J. 

3)  J  is  a  dyad  having  1  as  an  eigenvalue  of  geometric  and 
algebraic  multiplicity  one;  and  0  as  an  eigenvalue  of 
algebraic  and  geometric  multiplicity  n-1. 

4)  The  eigenvalues  of  n  are 

i)  0  with  associated  eigenvectors  xR  and  x^ 
ii)  the  eigenvalues  of  A  other  than  r.  If  A  z  *  i 
X  +  r,  then  z)  «  \  j]  z. 

5)  Am  *  rra  J  +  and  since  r  is  uniquely  maximal,  (ft/r)m  -*■  0 
as  Hi  +  «  .  This  implies  (A/r)m  **■  J. 


3. 


x  A 

The  algorithm  is  essentially  a  computation  of  , 

xT  Am  1 
— o  »  — 

2  +  gm] 

- -  .  By  definition  of  J  this  is 

xj[rm  2  +  £m] -1 


*r)  +  xJCfl/r)ra 

Since  (0/r)m  0,  as  m  ♦  ®  ,  this  converges  to  xj  .  Since 

— .L 

T  T  T 

-  *  r  -L’  we  can  tlien  find  r  by  r  «  x*  A*l.  Note  that 

there  is  no  particular  reason  to  increment  by  only  one  power  of 

A  in  step  2.  We  could  precompute  for  some  large  m  and  then 

use  it.  The  limitation  is  that  if  r  >  1  or  r  <  1,  then  Am 

(without  scaling)  may  overflow  or  underflow  the  machine.  One 

simple  way  around  this  is  to  simultaneously  compute  large  powers 

of  A  and  rescale.  We  can  do  this  by  successive  squarings.  (Also 

note  that  x^s  arbitrary  as  long  as  it’s  positive.) 


!•  4  *  i;  do  *  a/c4  i  -i) 

2-  sl.i  *  tel 


One  may  easily  check  that  A 


7m  'j* 

A  /  (xj  A“  1)  . 


The  convergence 
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rate  is  determined  by  the  rate  at  which  (£/ r)  approaches  0, 
this  being  geometric  with  rate  r^/r  where  r ^  is  the  second 
largest  eigenvalue  modulus. 

Discrete  Time  Chains 

Suppose  a  Markov  chain  has  M  states  {1,  . ..,  M},  and 

S(k)  denotes  the  state  after  the  k^  transition.  The  evolution 

of  S  is  characterized  by  a  stochastic  matrix  A  called  the 

transition  probability  matrix.  If  S (0)  is  drawn  according  to 

a  distribution  jr(0) ,  then  the  distribution  of  S(k)  is  ir(k) , 

T  T  k 

where  jr  (k)  *  jr  (0)  A  .  The  existence  of  a  limit  for  jr(k) 
as  k  ♦  «  is  a  fundamental  question.  There  are  three  cases. 

1)  If  A  is  primitive  then  a  limiting  £  exists,  independent 

of  ir(0)  .  Further  £  is  the  unique  left  eigenvector  -- 

£T  4  *  £T»  £  >  0>  E^  i  “  1.  This  follows  from  the  fact 

k  T 

that  A  approaches  J  *  ^  £  as  k  +  «  .  (Because  of  the 
primitivity,  all  eigenvalues  other  than  1  are  strictly 
inside  the  unit  circle.)  £  is  called  the  ergodic, 
equilibrium,  or  steady  state  distribution,  and  the  chain 
is  called  ergodic. 

2)  A  irreducible  but  imprimitive  --  The  irreducibility 

guarantees  the  existence  of  a  unique  left  eigenvector 

T 

£  s.t.  £  >  J),  and  £  1_  »  1.  But  ir(k)  exhibits  oscillatory 

behavior,  and  £  is  not  a  true  limit.  Recall  that  A  impri¬ 
mitive  implies  that  there  exist  i,j  s.t.  j  is  reachable 


from  i  at  only  at  periodically  spaced  transition  epochs. 
(Erom  earlier  remarks,  it  is  clear  that  one  can  take 


j  »  i,  i.e.  there  is  a  node  in  the  graph  s.  t. 

every  cycle  beginning  and  ending  at  i  has  length  that  is 

a  multiple  of  some  integer  ^2.)  To  see  the  oscillatory 

behavior,  consider  the  simple  case  of  A  -  [  0  * j  ,  which 

\  1  0/ 

«  1  l 

corresponds  to  the  chain  <C 36  ■  In  this  case  £  »  (y  ,  y)  , 

and  S  does  spend  half  its  time  in  each  state,  asymptotically. 


However 


So  if  n(0) 


(1,0), 


£(k)  *  (y  *  (-l)k  y,  y  -  (-l)k  y)  which  oscillates  between 
(1,0)  and  (0,1). 


Remark:  If  £  is  a  left  eigenvector  with  eigenvalue  1,  then 

T 

tt ( 0 )  *  £  implies  ir(k)  ■  £  ,  V  k.  If  £  >  0  and  £  1_  *  1, 

then  £  is  called  a  stationary  distribution  for  obvious  reasons. 
Irreducibility  guarantees  the  existence  and  uniqueness  of  a 
stationary  £,  and  primitivity  guarantees  that  jr(k)  approaches 
this  £  in  the  limit.  We  will  only  deal  with  irreducible,  non¬ 
negative  matrices,  so  the  term  "stationary  distribution" 
will  always  mean  "unique  stationary  distribution". 


3.  A  reducible  -  A  is  stochastic  so  1  is  trivially  an  eigen¬ 
value.  However,  it  can  have  geometric  and  algebraic 
multiplicities  larger  than  1,  though  there  is  at  least  one 


i 
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left  eigenvector  which  can  be  chosen  nonnegative. 
Gantmacher  [8]  discusses  the  possibilities  in  detail. 


Continuous  Time  Chains 

The  transition  probabilities  of  discrete  time  are 
replaced  by  transition  rates  ,  where  _>  0,  *  0  V  i. 

When  S  *  i,  the  next  state  and  time  of  transition  are  determined 
by  a  set  of  competing  Poisson  processes  with  rates  v^.  If 
an  event  from  process  j  occurs  first,  the  state  goes  to  j  at 
the  time  of  occurrence  (Since  v^.  *  0,  there  are  effectively  at 
most  M-l  processes.  If 


=  0  V  j,  the  state  i  is  called 


absorbing.  We  only  consider  those  chains  for  which  I  v..  >  0, 

3  1J 

Vi.)  A  straightforward  calculation  shows  that  no  matter  which 
j  "wins",  the  a  posteriori  distribution  on  the  holding  time  in 


i  is  exp  (v^)  where  * 


That  is ,  the  minimum  of  the 


exponential  random  variables  with  parameters  is  distributed 

as  exp(v.)  regardless  of  which  variable  realizes  the  mimimum. 

v .  . 

Further,  the  probability  that  j  "wins"  is  .  This  leads 

to  the  following  equivalent  view  of  the  chain.  When  S  *  i,  the 
time  of  the  next  transition  is  drawn  from  expCv^),  and  at  this 
time, the  next  state  is  drawn  according  tc  the  probabilities 

That  is,  there  is  a  discrete  time  chain  with  transition 

probability  matrix  ^  (where  ^  diagonal  (v^)  and  (v)^.*  v  ), 

and  a  state.1  dependent  clock  determining  the  actual  time  of 
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transitions  . 

The  role  of  the  discrete  time  A  matrix  is  played  by  the 
infinitesimal  generator  matrix  £  ^  v  -  The  transition 

probability  matrix  P(t).  where  (t)  *  Pr[S(t)  *  j/S(0)  *  i]v 
is  the  unique  solution  of  the  so  called  forward  and  backward 
equations  P(t)*2  P(t)  3  P(t)  Q  s.t.  P(0)  *  .1.  The  solution  P(t) 

-  (2  t)n 

is  the  well  known  matrix  exponential  exp (2  t)  *  I  - 

n*0 

If  S(0)  is  drawn  from  a  distribution  n(0)  ,  then  ir^(t)  *  jrT(0)P(t)  . 

The  existence  of  limits  is  determined  by  the  irreducibility 
of  the  nonnegative  matrix  v,  or  equivalently,  by  the  connecti¬ 
vity  of  the  graph  in  which  an  arc  i  j  is  present  iff  >  0. 

If  irreducibility  is  present,  then  iT_(t)  approaches  a  limit 
independent  of  jr(0)/and  the  chain  is  termed  ergodic.  We  assume 

irreducibility  for  the  discussion  in  this  chapter.  The  limit 
T  T  T 

£  satisfies  £  2“  £  ,  £  >£  E  i  ■  I  and  is  called  the 
stationary  or  ergodic  distribution.  (Note  2  £  “  £  by 
construction,  so  0  is  an  eigenvalue.)  Since  a  limit  exists, 
all  nonzero  eigenvalues  of  2  must  have  nonpositive  real  parts, 
else  exp(2  t)  explodes,  and  such  a  solution  is  not  probabili¬ 
stically  meaningful.  However,  we  have  not  introduced  a  notion 
analogous  to  primitivity  to  prevent  oscillation,  i.e.  to  rule 
out  purely  imaginary  eigenvalues.  Such  a  notion  is  not  necessary 
because  in  continuous  time,  any  state  j  reachable  from  i,  is 
reachable  within  any  positive  time,  i.e.  Pjj(t)  >  0  V  t.  We 
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derive  the  above  properties  of  the  continuous  time  chain  by 
introducing  the  following  uniformization  procedure. 

A  A 

Let  v  be  chosen  >  max  {v.}  ,  and  set  A  ■  I  +  0/v 

—  1  ~v  ”  ^ 

Then  A~  _>  0^  ^  is  stochastic  and 

a  n 

exp  (fit)  -  exp(-  Ct(I  -  A~j)  »  Z  e'vt 

n-0  n! 

This  has  a  simple  probabilistic  interpretation.  A  chain  is 
driven  by  a  single,  Poisson  clock  of  rate  v.  At  each'  "bong** 

a  transition  occurs  according  to  the  matrix  A~  .  Such  a  transition 

v.  . 

occurs  from  i  to  j ,  i  +  j ,  with  probability  and 

v 
V  • 

from  i  to  i  with  probability  1  -  .  Notice  that  if  v  >  v. 

v  1 

this  self-loop  has  positive  probability.  This  reflects  the 
fact  that  state  i  is  being  driven  faster  than  its  "natural'’ 
rate, 

The  matrix  is  irreducible  since  v  is,  and  \  is  an 

A 

eigenvalue  of  A^  iff  v(X-l)  is  an  eigenvalue  of  2*  The 
corresponding  eigenvectors  are  the  same.  Applying  the  PRF 
theorem  to  A~  and  using  these  relations  between  2  and  »  we 
can  conclude  that 

•  0  is  an  eigenvalue  of  2  with  algebraic  and  geometric 
multiplicity  of  1^ 

•  all  other  eigenvalues  of  2  have  negative  real  partsy 

•  2  and  have  the  same  stationary  distribution  £. 
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T  T 

We  now  show  that  n-  (0)  exp(Q  t)  approaches  £  as  t  ♦«,  for 
any  tt  T(0) . 

T 

It  suffices  to  show  that  exp(j2  t)  approaches  J  *  1  £  .  Let 
£2  »  -  J.  Then  (A^%)n  *  J  +  £2n.  This  implies 

exp  C^t)  -  £  e'0t  (J  +  £n)  -  J  +  exp ( - vt ( I_  -  a)). 

n»0 

By  construction,  all  eigenvalues  of  £2  are  on  or  inside  the  unit 
circle,  but  1  itself  is  not  an  eigenvalue  of  £  .  Therefore, 
the  eigenvalues  of  I  -  g  have  positive  real  parts,  which  implies 
that  expC-vtCI,  -  £2))  -►Oast-*-®. 

One  useful  consequence  of  uniformization  is  an  algorithm 
for  finding  the  stationary  distribution  of  Q.  The  previous 
algorithm  does  not  work  directly  on  Q  because  Q  contains 
transition  rates,  and  its  powers  have  no  probabilistic  meaning. 
However,  the  stationary  distribution  of  £  is  the  same  as  that 
of  A~  for  any  uniformizing  rate  v.  The  previous  algorithm 
does  work  on  provided  that  it  is  primitive.  The  following 
argument  shows  that  any  choice  of  v  >  max{v^}  yields  a  primi- 

A  A 

tive  A*.  .  For  such  a  choice  of  *  1  -  v^/v  >  0,  V  i. 

This  implies  that  every  node  in  the  graph  associated  with  A~ 
has  a  self-loop.  In  turn,  this  implies  that  the  g.c.d.  of  all 
cycle  lengths  is  1.  Primitivity  then  follows  from  irreducibility . 

If  a  continuous  time  chain  is  ergodic,  the  ergodic  proba¬ 
bility  pi  admits  an  interpretation  as  the  limiting  fraction 
of  time  the  chain  spends  in  state  i.  For  a  discrete  time  chain, 
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the  ergodic  probability  p^  is  the  limiting  fraction  of  transi¬ 
tion  epochs  at  which  the  chain  leaves  state  i.(One  could  also 
count  the  fraction  of  epochs  at  which  state  i  is  entered.  In 
equilibrium,  these  are  the  same,  i.e.  looking  immediately  before 
or  after  a  transition  makes  no  difference.)  If  A~  is  a  uni- 
formizing  matrix  for  a  £  matrix,  then  is  the  transition 
probability  matrix  for  a  discrete  time  chain  embedded  at  "clock 
bangs".  Intuitively,  A^  and  £  have  the  same  ergodic  probabili¬ 
ties  because  (uniform  rate)  Poisson  sampling  takes  a  truly 
"random"  look  at  the  continuous  time  chain.  Now,  if  the 
continuous  time  chain  has  a  transition  rate  matrix  v  (so  that 
2  *  v  -  Vp) ,  the  stochastic  matrix  v  is  a  transition  pro¬ 
bability  matrix  for  a  discrete  time  chain  embedded  at  transi¬ 
tion  epochs  of  the  continuous  time  chain  as  determined  by  the 
state  dependent  clock.  Since  v  is  irreducible,  v  has 

a  stationary  distribution  £.  However,  £  and  £,the  ergodic 
distribution  for  2»  need  not  be  the  same  because  the  holding 
times  in  various  states  need  not  be  the  same.  That  is,  even 
though  the  holding  time  in  each  state  is  exponentially  distri¬ 
buted,  the  means  are  different,  so  that  it  is  not  uniform 
Poisson  sampling.  However,  there  is  a  simple  relationship 

between  £  and  £  which  is  obtained  by  rescaling  to  correct 

T  T 

for  clock  rate  differences.  Specifically,  £  2  ”  2  implies 

T  T  T  - 1  T 

£  (v  -  Vp)  ■  0  which  implies  £  Vp(vD  v)  -  p  v^.  Since 

the  stationary  distribution  £  is  unique,  it  follows  that 

T  T 

£  vD  is  a  scalar  multiple  of  £  ,  i.e. 


A. 


y 


t 
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et  -  (et  ed)/Cet  — d  or  h 


p .  V  . 
*1  1 


I  Pi 


V, 


These  issues  can  be  seen  in  the  following  example. 


Consider  the  continuous  time  chain 


so 


Thus  v  * 
chain. 


The  ergodic  probabilities  are  pQ  * 


3 


which  corresponds  to  the  discrete  time 


This  chain  is  irreducible  with  P0  "  “  7  •  However,  it  is 

not  ergodic  since  it  is  not  primitive.  Here,  p^,  f  P^ 
if  x  f  y  because  this  discrete  time  chain  ignores  the 
difference  in  clock  rates.  If  X  *  u  ,  the  continuous  time 
chain  is  already  uniformized  so  that  Pj^  *  Pj_  “  7  •  But  t*ie 
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matrix  vD^  v  is  still  imprimitive  and  its  eigenvalues  are 
1  and  -1. 

/N 

If  we  pick  a  uniformizing  rate  v  >  X,y,the  associated 
discrete  pime  chain  is 


This  chain  is  irreducible  and  primitive,  i.e.  ergodic. 

In  discrete  time,  -he  left  eigenvector  condition  for 
T  T 

£»  £  *  £  indicates  a  global  balance.  (Writing  out  the 

equation  gives p.  «  I  p.  A...)  In  continuous  time,  the 

j  J  Ji 

analogous  condition  is  with  "probability  flow".  The  condition 


T  T 

£  2  ■  £  translates  into  Pi  E  v..  ■  E  p4  v-..  Some  chains 

j  j  J  J1 

exhibit  a  stronger  form  of  balance,  called  detailed  balance, 
in  which  there  is  equilibrium  between  every  pair  of  states  -■ 
pivij  "  pjvji*  In  matrix  terms,  this  is  £D  2  *  £T  Ed*  where 
2q  is  the  diagonal  matrix  obtained  from  £.  This  implies 


A  A  1  - 1  A 

2  *  Ed  £  Ed  is  symmetric,  which  in  turn  implies  that  £ 

A 

and  hence  £  have  real  eigenvalues.  Further,  £  is  diagonal - 
izable  via  an  orthogonal  matrix  $  which  implies  that  £ 
diagonalizes  £. 

For  a  general  chain,  balance  exists  between  any  two  sub- 


.f-  ^  V  —  A  V 


-  r*  v  » 


L-i.  r  £ 
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1 

i 

i 

j 

sets  and  N2  which  partition  the  state  space,  i.e. 

icNj  1  jeN2  3  ieN2  jeN^  13 

For  birth-death  processes,  we  can  choose  »  { i  |  i  <_  iQ}  . 

*  (i|  i  >  iQ}  ,  and  this  balance  equation  is  the  detailed 
balance  condition  for  iQ  and  i  +  1.  Since  iQ  is  arbitrary, 
this  shows  that  birth-death  processes  always  exhibit  detailed 
balance. 

Miscellaneous  Matrix  Theory 

1)  An  irreducible  nonnegative  matrix  is  similar  to  a  scaled 
stochastic  matrix.  To  see  this,  let  A  >  0  be  irreducible. 

Let  r  be  the  PRF  root  and  D  be  the  diagonal  matrix  obtained 
from  a  positive  right  eigenvector  xR.  (Note  xR  > 
invertible.)  Then 


2)  Girshgorin1 s  Theorem  (See  [9].) 

Let  M  be  an  n  x  n  matrix  (possibly  complex).  Let 
A  n 

r.  Z  | M . .  [  .  Then  the  eigenvalues  of  M  are  contained 
1  '  j«l  13 

j*i 

in  the  union  of  the  circles  centered  at  {M^}  with 
respective  radii  { r .  Applying  this  to  a  2  matrix 
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(for  which  ■  -v^,  r^  ■  vi)  gives  another  quick  proof 
that  nonzero  eigenvalues  of  2  have  negative  real  parts. 
Further,  if  D  is  a  diagonal  matrix  with  positive  diagonal 
terms,  then  the  theorem  shows  that  the  eigenvalues  of  D  -  £ 
have  positive  real  parts. 


1 
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III .  Statistics  of  the  Speaker  Process 

A.  Transition  Probabilities 

Consider  the  two  state  chains  of  a  single  speaker.  The 
2  matrix  for  this  process  is 

t  .:) 

The  equilibrium  distribution  is  binominal  pQ  ■  P-^  *  y—j-  • 

The  nonzero  eigenvalue  of  £  is  -(X+y).  Let  f ^ (t)  * 

Pr[j  at  t|i  at  0]  i,j  *  0,1.  From  that  fact  that  f (t)  is 
of  the  form  a  ♦  b  e‘(X+u)t,  f^  (•)  *  p. ,  £ ^  (0)  *  6^,  it 
follows  that 

£u(t)  -  Pt  *  Pj 

fijCt)  -  Pj  •  Pj  ^  3' 

For  the  N  speaker  process,  the  2  matrix  is  tridiagonal  with 

Qu  -  - (N-i) X  -  ip 

Qi,i+1  -  (N-i)X  0  <  i  <  N 

9i,i-i  ■  iu 


(Note  that  the  indexing  runs  from  0  to  N  to  preserve  the 
connection  with  number  of  active  speakers.)  Given  the  single 
speaker  transition  probabilities,  we  can  derive  the  N  speaker 
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probabilities  using  the  independence.  The  expression  is 


PijCt) 


term  1  term  2 


/A\ 

where  it  is  understood  that/  I*  0  if  B  >  A  or  B  <  0,  This 

expression  comes  from  the  fact  that  given  A(0)  *  i,  it  is 
possible  to  have  A(t)  ■  j  if  some  of  the  k  active  at  0  are 
active  at  t  and  j-k  of  the  inactive  at  0  are  active  at  t, 
these  events  occuring  with  probabilities  given  by  term  1  and 
term  2  respectively. 

By  examining  the  expressions  for  f. .  (t)  one  can  see  that 

3  *skt 

P^Ct)  is  a  linear  combination  of  the  exponentials  e  , 
where  sk  *  -k(A+y),  0  <  k  <  N.  The  {s^}  are  of  course  the 

eigenvalues  of  The  coefficient  of  the  sQ  term  is  just 

Pj  *|  j)  (x^y)  (y+x)  »  t*ie  equilibrium  probability  that 

A  -  j. 

From  the  discussion  on  detailed  balance,  we  know  that  2 
is  similar  to  a  symmetric  matrix  which  implies  2  is  diagona- 
lizable.  (In  a  more  elementary  way,  this  particular  2  is 
diagonalizable ,  i.e.  has  a  basis  of  eivenvectors ,  because  it 
has  distinct  eigenvalues,  and  eigenvectors  corresponding  to 
distinct  eigenvalues  are  linearly  independent.)  Let 
S  »  diagonal  {s^}  .  If  L  diagonalizes  2>  i.e.  2  *  L* 1  S  L 


I 
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(the  rows  of  L  are  left  eigenvectors  of  £  and  the  columns  of 

are  right  eigenvectors  of  2)  then  P(t)  ■  exp(£  t)  *  L*1  exp(S  t)L. 

s .  t  .  ”  “  ” 

Since  £  is  diagonal,  [exp(S  t)  ]  ^  *  e  1  6^.  e' $ 

So  if  we  can  find  L,  we  can  find  another  representation  of  P(t) . 

We  now  develop  explicit  expressions  for  the  eigenvectors  of 

The  expressions  are  not  original  (see  Karlin  [10]),  but  the 

derivation  was  obtained  independently.  For  this  reason,  the 

discussion  is  not  too  detailed. 

We  wish  to  find  a  matrix  Ls.t.  L£*SL,  i.e.  the  j 

row  of  L(j),  satisfies  L(j)£  »  SjL(j).  To  complete  the 

diagonalization,  we  then  need  to  find  L’1.  Let  e  *  -jj  and 

redefine  £  by  factoring  out  y  from  each  term,  i.e.  2  -  2» 


so  that  now 


-(N-i)e  -  i 


Qi,i-1  *  i 
Qi>i+1  -  (N-i) e 

This  factoring  reduces  the  eigenvalues  by  y,  i.e.  s^  now  is 
-k(l+c) ,  and  does  not  affect  the  eigenvectors.  (That  is,  in 
the  expression  for  P i j  (t) ,  the  coefficients  of  the  exponentials 
only  depend  on  the  relative  values  of  X  and  y.) 

An  eigenvector  L(0)  is  already  known  to  be  the  equilibrium 


probability  vector,  p.  =  — - - rr 

1  Vi/  (1*0 

take  L^(0)  *  ( \  )  e  i .  Notice  that  L(0)  i 


For  convenience,  we 


is  an  N-fold  convolution  of 


t 
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the  vector  [1  e]  with  itself,  i.e.  L(0)  «  [1  e]  !s.  The 
convolution  operation  is  associative  and  commutative  so 
[1  e]*...*[l  e]  -  [1  c]  is  well  defined.)  One  can  check 

that  [1  -  1]  is  an  eigenvector  for  s^  in  the  case  N  a  1.  We 
now  show  that  the  vectors  [1  -  1]  and  [1  e]  generate  the 
U(j». 


Claim:  L(j)  *  [1  e]  *  [1  -  1]  Our  method  of  proof 

is  to  relate  2  and  L  for  an  N  speaker  process  to  those  for 
an  N-l  speaker  process.  We  use  a  superscript  on  matrices  and 
vectors  to  indicate  the  number  of  speakers. 


Define  GN  by 


1 


3  i  i 
j  <  i 


N  - 1 

Once  can  check  that  (G  )  has  ones  on  the  diagonal,  minus  ones 
on  the  superdiagonal,  and  zeroes  elsewhere.  As  a  change  of 
basis,  this  transformation  is  replacing  a  state  j  by  the  sum 
of  all  states  <_  j.  The  crucial  step  is  then  to  show  that 
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This  relationship  can  be  verified  by  a  straightforward  computa¬ 
tion,  so  we  omit  proof.  We  can  now  draw  two  conclusions. 

•  If  <j>N(s)  -  det(sl  -  £N)  then  $N(s)  =«  s<j>N_1  (s  +  (1+e))  . 

This  follows  from  the  fact  that  similar  matrices  have 

the  same  characteristic  polynomials,  and,  an  expansion 

of  detfs^  -  2  )  along  the  last  column.  The  relation 

between  and  in  turn  implies  that  sQ  *  0  is 

N 

an  eigenvalue  of  2  .  and  that  if  s^  *  - j (1+e)  is  an 
N- 1 

eigenvalue  of  2  »  then  Sj  +  ^  *  -(j+l)(l+e)  is  an 

N  1 

eigenvalue  of  2  *  Since  the  eigenvalues  of  2  are  0 

and  -(1+e),  it  follows  that  the  eigenvalues  of  2  are 

s j  »  0  1  i  1  N,  as  indicated  before. 

•  A  direct  computation  shows  that  if 

LN_1(j)  2N'1  «  Sj  LN" 1  ( j )  then 

*  sj  +  1(LN_1(j),0] 

or  CLN“1(5) ,01  (GN)_12N  -  sj+1[LN-1(j),0](GN)'1 

N- 1 

In  words,  if  L  (j)  is  an  eigenvector  of  Sj  for  the  (N-l) 

N- 1  N-l 

speaker  process,  then  [L‘  (j),0](G  )  is  an  eigenvector  of 

Sj+1  for  the  N  speaker  case.  Since  LN(0)  ■  [1  e]  N  V  N, 

1  N-l 

L  (1)  *  [1  -1],  and  multiplication  by  (G  )  is  equivalent 

to  convolution  with  [1  -1],  the  claim  follows  by  induction. 


a? 


T 
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Remark:  [1  e]  ^  ^  *[1  -1]  ^  can  be  expressed  in  terms 

of  the  binominal  coefficients  and  powers  of  e.  The  resulting 
polynomial  is  a  special  case  of  the  Krawtchouk  polynomials  (see 
Karlin[10]) . 

To  complete  the  diagonalization  we  need  to  find  L_1.  We 
2  N 

now  show  that  ^  *  (1  +  e)  _I.(We  drop  the  superscript  for 

2 

number  of  speakers  so  that  ^  has  its  usual  meaning.)  Thus  one 

-M/2 

can  take  L(l+c)  for  the  diagonalization. 

Claim:  L2  *  (1  -*■  e)N  I 

The  proof  is  somewhat  involved  so  we  "sketch"  it.  First,  one 

shows  that  L  is  also  a  right  eigenvector  matrix.  This  implies 

L  *  L  1  D  for  some  diagonal  matrix  i) .  Then  one  shows  D  *  al_ 

for  some  constant  a.  These  are  the  involved  parts.  Determi- 

-1  2 

ning  a  is  easy.  L  -  a^  implies  L  -  a^.  The  first  row  of  L 
is  [1  c]  N,  and  the  first  column  of  ^  is  1_.  Now  ([1  e]  N)l^  * 

(1  +e)N  by  construction.  Thus  a  *  (l«-e)N. 

The  matrix  L  has  several  other  interesting  properties 
which  derive  from  the  rich  structure  of  the  chain.  They  seem 
to  have  little  probabilistic  significance,  so  discussion  is 
omitted. 


irnuflUiw  \ 
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B.  Mean  First  Passage  Times 

We  mentioned  that  the  crucial  question  is  one  of  time  scales. 
One  useful  characterization  of  this  is  the  mean  first  passage  time 
between  states.  Let  -  -  mean  first  passage  time  from  i  to  j , 

Ti  4  Ti  4  Ti,i-1-  For  the  case  "-1-  Toi-  TS  ■  X  and 

T10  *  T1  *  -jj  .  We  can  in  fact  derive  recursive  relations  for 
these  quantities  for  a  general  birth-death  process.  Let  An  and 
un  denote  the  birth  and  death  rates  respectively.  Then 


n 


Vpn 


Vyn 


n-l 


This  equation  says  that  we  must 


a)  wait  until  the  first  exit  from  n,  the  mean  of  this  time  is 
1 


X  +u 
n  pn 


and 


b)  if  this  transition  is  to  n-l,  which  occurs  with  probability 

yn  *  + 

X~'+""u  *  We  must  wa^t  another  Tn-1  +  Tn  to  first  reach  n+1. 


Solving  for  Tn  yields 


T 


+ 

n 


The  following  induction  argument  shows  that  a  solution  to  this 
recursion  is 

+  i  n 

T  *  t  ■  l  p- .where  the  {p.}are  the  ergodic  probabilities. 
n  Anpn  j-0  3  3 


. . 

>^1*.  ^  * 


Proof:  Detailed  balance  shows  X_p_  -  u_  . ,  v  . ,  .  The  basis. 
~  n  n  n+i  *n+l  ’ 

T*  *  -r—  is  trivial.  Assuming  .the  formula  true  for  n-1,  we 

°  u 

can  substitute  into  the  recursion  to  obtain 


T 


+ 

n 


i-  ♦  J!s  r  1 _ 

X  X  LX  .  p  . 

n  n  n-l  n- l 


n-1 

E 

j-0 


1  ln"^ 

•  r—  (1  ♦  —  E  Pi)  (using  detailed  balance) 

n  pn  k-0  3 


n 

E  P. 

j-0  • 


1  w 

By  analogous  reasoning,  T  -  ■  E  p..  (See  Keilson  [7] 

wnpn  j-n  3 

for  more  on  the  actual  first  passage  time  distributions.) 

The  general  mean  first  passage  time  between  two  states  is  now 
seen  to  be 


i 

E 

•5*1 


5  >  i 


5  <  i 


For  the  speaker  chain,  we  can  say  more.  First  observe 

that  in  the  speaker  chain,  X^  is  a  decreasing  sequence,  and 

un  is  an  increasing  sequence.  For  such  a  chain,  one  suspects 

that  T*  >  T'  and  T*  <  T*  .  .  This  is  indeed  the  case, 
n  n+l  n  n+l 


A  simple  inductive  argument  goes  as  follows.  (We  only  prove 
the  inductive  step.  The  basis  is  a  simple  computation).  From 


the  expressions  derived 


T 


n 


T 


n+1 


n+1 


n+1 


n+1 

Jn+1 


n+2 


From  monotonicity ,  - — 


_ 1_ 

M 


and  ~ 


n+1 


-  u. 


By  the 


“n  "+1  "n  ^n+1 

induction  hypothesis,  T~+1  >  T’+2  which  implies  T~  >  T^+1  , 
thus  completing  the  induction.  (Similar  reasoning  works  for 


For  the  speaker  chain,  Xn  *  (N-n)X  and  yn  *  n  y.  The  birth 
and  death  rates  become  equal  at  n  *  N  *  X.  For  n  >  X,  a 
transition  to  n+1  becomes  an  "uphill  battle"  of  increasing 
difficulty.  (Similarly  for  T’,  n<X.)  Thus  one  suspects  that 
th*.  time  to  go  from  A  to  A  ±  0(N)  becomes  quite  large  as  N-**»  . 

An  analysis  of  this  has  been  done  by  Bellman  and  Harris  [11]. 
They  show  that  the  actual  distribution  approaches  an  exponential 
with  a  mean  that  grows  very  quickly  with  N,Of  more  interest 
to  us  are  mean  passage  times  towards  A,  especially  from  above. 

From  the  previous  results  we  can  derive  some  simple  bounds. 
Recall 


n 


N 

(  2  PJ 


1 '  P  u 

3*n  J  *n  n 


±  +  T 

n  ua  *n+l 


The  first  equation  shows  T"  >  — 
^  n  —  u 


Combining  the  second  one 
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with  the  inequality  Tn+^  _<  (shown  above)  shows  T*  <_  — iy- 

n 

for  n  s.t.  un  >  An  i.e.  for  n  >  A.  Let  nj  >  n2  >  5  be  given 
Then  the  lower  and  upper  bounds  show 


T  > 

nl»n2  ~ 


n. 


*n2+1 


1_ 

3  U 


1 

u 


nl 

z 

j-n2+i 


1 

-T* 

3 


n. 


Ln, ,n7  <  Z  ju  -  (^-jH 

"j»n2+1 


1  A  1 
uXT+eT  Z  •  _  t 
j*n7+l  3  A 


,  where  e  *  — 


These  harmonic  sums  can  easily  be  bounded  in  terms  of  the 
logarithm.  Specifically,  if  x  and  y  are  positive  integers, 
with  x  <y,  then 


logC-^J1)  <  {  |  <  log  (-*.) 

x  j  *x  3  ~  x  1 


(Logs  are  base  e.)  Applying  these  bounds  to  the  previous 
bounds  on  passage  times  we  obtain 


where  f  1  is  the  ceiling  function. 


We  now  investigate  these  bounds  as  N,  n^,  n2  approach  infinity, 
and  n^  and  n2  have  some  specified  growth  relative  to  the  mean. 
That  is,  we  assume  n^  and  n,  approach  infinity  as 


-65- 


»!_  ■  Pfl  ♦  fi(N) 

n2  *  fx]  +  f2)n) 
where  £1(N)  >_  f  ^  CN)  >_  1. 

Remarks : 


•  To  avoid  technicalities,  we  only  consider  "asymptotically 

commensurate"  functions,  i.e.  for  any  functions  g  and  h  that 

that  we  compare, we  assume  the  ratio  g(N)/h(N)  goes  to  ®  , 

to  0,or  to  a  positive  limit,  as  N  •»  .  In  the  last  case, 

we  write  g(N)  *  0(h(N)).  If  g(N)/h(N)-*  1,  we  write 

« 

g(N)  «*h(N).  Clearly,  f,(N)  and  f2(N)  are  at  most  0(N), 
and  f2/fi  y  where  0  <  y  <  1, 

•  We  neglect  all  "integer  roundoff"  errors,  e.g.  X<^  fx] 

and  ^  nl  since  X  »  0(N). 

n2+l  n2 


With  this  behavior  for  n1  and  n2  specified,  the  bounds 
become 


A  +  f , 

- i  <  T 

X  +  £  ~  ni  »n2 

2 


< 


log  (f1/f2) 


The  asymptotic  behavior  of  these  bounds  is  as  follows: 


Lower  Bound 

•  If  either  f^  «■*  f2  or  f^(N)/N  -*■  0,the  bound  approaches  0 
since  the  argument  of  the  log  -  1.  In  the  latter  case, 
this  is  because  X  dominates  and  f 7 . 
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•  In  the  other  cases,  the  bound  approaches  the  positive 
constant 


1 

u 


log 


1  +  f,/I 

C - i— ;  ] 

l  *  £2/a 


Upper  Bound 


•  This  is  constant  only  if  ■  0(f2).  If  f ^  **  f2,  the 
constant  is  zero,  otherwise  it  is  positive. 

•  Since  fj^  _>  f2,  cannot  vanish.  Thus  the  only  other 

possibility  is  £^/ £2  ®»  in  which  case  the  bound 

grows  as  log  (f j/f2) . 

Thus : 

•  Both  bounds  approach  0  if  f  ^  ~  f2. 

•  If  f^  *  0  (N) ,  f2  ■  0 (N) ,  but  fj^  -f-  f2>  the  bounds 

approach  (different)  positive  constants. 

•  Otherwise,  they  disagree  in  asymptotic  behavior. 

The  upper  bound  is  somwehat  "closer  to  the  truth"  in  the 
following  sense.  (The  proof  of  these  results  is  appendix  A.) 
Suppose  n  tends  to  infinity  as  J.  *  f(N).  Then 

Region  I:  f(N)//N  <  00 

Then  T*  -  0(i^_ ) 
n  /N 

Region  II:  f(N)  -  where  x^  but  xn/%/n  -►  0 

Then  T "  -  0  (— ) 


fc.4 ' 


*  ’h**'*W_ 
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Region  III:  f(N)  -  0(N) 

Then  Tn  O(^). 

Now  recall  that  the  bounds  on  the  one-step  passage  time 

T"  are 
n 

_1  .  J,  <  T-  <  _1  _  a  1  .  _1_ 

nw  un  "  n  n-A 

Thus  the  one-step  upper  bound  is  correct  in  regions  II  and  III, 
whereas  the  one-step  lower  bound  is  correct  only  in  region  III. 
(Since  the  two  bounds  do  agree  in  region  III,  O(^)  must  be  the 
correct  behavior.  We  say  more  about  the  constant  in  the 
appendix.) 
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IV.  Effective  Service  Times 

Given  a  data  service  rate  vector,  r  -  (rQ,...,rN),  a  message 
length  x,  and  an  initial  phase  state  i,  we  would  like  to  know  the 
distribution  of  time  it  takes  to  complete  service  and  the  phase 
state  at  completion  (jointly) .  We  first  consider  the  simplest 
form  of  this  problem  in  which  N«l,  rQ  *  1,  and  r^  »  0,  i.e.  we 
consider  a  "Markovian  server"  who  is  either  "on"  or  "off". 

In  this  case  (because  r^  ■  0) ,  the  problem  is  equivalent  to 
finding  the  total  time  T  that  must  elapse  until  the  time 
accumulated  in  state  0  is  equal  to  x,  given  a  start  in  i.  Let 

Hi(t,x)  £  Pr[T  <  t|i  and  x],  i  -  0,1 

To  find  ,  we  first  compute  a  related  quantity.  If  the  chain 
starts  in  state  i,  and  we  observe  it  for  a  time  t,  what  is  the 
amount  of  time  w  spent  in  state  0.  Let 

F^(t,x)  ^  Pr[w  <_  x|i  and  observation  for  t]  . 

F  and  H  are  related  as  follows.  The  event  (T  <_  t|x,i)  occurs 
iff  the  event  (w  >  x|i,t)  occurs.  Therefore  H^(t,x)  • 

*  1  -  Fi(t,x)  +  Pr[w  ■  x | i , t ] .  (There  may  be  impulses  so  we 
have  to  worry  about  the  part  of 

Before  computing  F^,  we  make  some  observations  about  H^. 
First,  if  the  chain  starts  in  i»l,  the  amount  of  time  needed 
to  finish  x  is  the  amount  of  time  needed  to  reach  i*0 
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(exponentially  distributed  with  mean  u"1)  plus  the  amount  of 

time  needed  to  finish  x  given  a  start  in  i»0.  Thus  H,  *  H  *p 

1  o  e 

Second,  the  event  (T  <  x|0)  is  impossible  while  the  event 

—  \  Y 

(T  *  x|0,x)  occurs  with  probability  e  (the  probability  that 
the  chain  first  leaves  i»0  after  time  x) .  Thus  the  function 
HQ(t,x)  has  the  following  general  shape 


By  similar  reasoning,  FQ(t,x)  *  1  for  x  ^  t  and  has  a  jump  of 
height  e'Xt  at  x  *  t". 

PoC-t.x)  1 
(t 


'  x 

And  F^Ct.o)  *  e"wt,  the  probability  that  the  chain  leaves 
state  1  after  time  t  so  that  no  time  in  0  is  accumulated. 


f  i  uti) 


Fq  and  are  related  by  the  following  integral  equations  which 
can  be  obtained  by  conditioning  arguments. 

Xe'Xv  F1(t-v,x-v)dv  ♦  e'Xt  I (x  1  t) 

ue'wv  F0(t-v,x)dv  e'wt  I(x  >  0) 
where  I (  )  is  the  indicator  function. 

Taking  the  Laplace-Stieltjes  transform  on  x  and  the  Laplace 
transform  on  t(t*^z,  x  gives 


V* 


.  f 

Jn 


Fx(t,x)  *  j0 


r 


F0(z,s)  -  Fj(s,s)  3  ,  -  .  ^  .  z 


F,(z,s)  •  F  Cz.s)  — f-  .  1 


U  *  Z  U  +  Z 


*  *  *  u  *  z  , 

TT+TTzTTTjJ+TTrx  u 


which  imply  F  (:,s) 
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As  one  check  on  correctness,  we  can  compute  the  expected  amount 
of  time  spent  in  state  0  as  a  function  of  the  observation  time  tJ 
and  then  take  its  Laplace  transform.  This  should  equal 


Is  Fo(z’s) 


s»0 


The  expected  time  spent  in  0  given  a  start  in  0  is 


E  I (state (v)  *  0] state  (0)  -  0)dv  -  /t  p  (v)dv, where 
0  0  0 

P00O)  is  the  transition  probability,  pQQ  (v)  *  e‘(X+u:)v. 

(See  Chap.  Ill  .)  Thus  the  expected  time  is  +  — - — »-[l-e* ^X+W^t]  . 

This  expression  can  be  interpreted  as  pQt(pQ  *  equilibrium  fraction 
of  time  spent  in  0)  plus  a  "bias"  term  reflecting  the  start  in  0. 

The  asymptotic  effect  of  this  bias  is  to  add  a  constant  — - — *•  . 

(A+ur 


The  Laplace  transform  of  this  is 


_u_  l  +  x  .1  .  1  i 

x+y  7  777  z  A  +  y  +  2  ' 

and  one  can  verify  equality. 

A 

/N 

Inverting  the  double  transform  FQ(z,s)  on  the  s  variable 
yields 


F0(z,x) 


exp (-z 


U+  v  *  z 
t  +  u 


x)  . 


the  inversion  on  the  z  variable  is  considerably  more  tedious. 


The  answer  is 


j  e'Xx5(t-x)  +  e'xXe'(A‘x)  [<p(t,x)  ]  ,  0  <  x  <  t 

f0(t,x)  - 

(  0  x  ?  t 

Ii(2/u) 

where  t(t,x)  *  X  I  (2/u))  +  \u  x  — = —  ,  u  =  Aux(t-x),  I 

0  /u  0 

and  1^  are  modified  Bessel  functions.  Because  we  took  Laplace- 

Stieltjes  transform  on  the  x  variable,  fQ(t,x)  is  a  probability 

density  as  a  function  of  x  for  t  fixed. 

From  the  previous  discussion  then>HQCt,x)  *  /°°_  fQ(t,v)dv. 

There  appears  to  be  no  simple  form  for  this  integral.  However, 
one  can  compute  the  mean  of  T,  i.e.  the  mean  total  time  needed 
to  complete  the  amount  of  work  x  given  a  start  in  0.  From 
probability  theory  this  is  [1  -  H0(t,x)dt.  The  evaluation 

is  tedious,  so  we  only  give  the  answer  --  (1  +  -jj)x.  This  may  be 
somewhat  surprising  at  first  in  that  there  is  no  "bias"  term. 

The  following  argument  shows  that  this  lack  of  bias  and  linear¬ 
ity  occur  because  r^  *  0  and  because  of  the  memoryless  property 
of  the  exponential  distribution.  Consider  the  time  to  complete 
an  amount  of  work  2x,  given  a  start  in  state  0,  Because  r^  *  0, 
work  is  only  done  in  state  0,  so  that  when  the  first  x  is 
completed,  the  chain  must  be  in  state  0,  Now  suppose  the 
first  x  is  completed  after  some  time  u  has  elapsed  since  the 
chain  entered  state  0  on  the  visit  of  completion.  (This  need 
not  be  the  initial  visit.)  Because  of  the  memoryless  property, 


-73- 


the  time  to  complete  the  second  x  is  independent  of  the  past, 

i.e.  is  independent  of  u  and  the  number  of  the  visits.  Thus 

the  time  to  complete  the  second  x  is  a  probabilistic  replica  of 

the  time  to  complete  the  first  x.  Since  x  is  arbitrary,  this 

shows  that  the  mean  completion  time  is  linear  in  the  amount  of 

work,  given  a  start  in  state  0.  For  large  x,  we  would  expect 

x  \ 

that  the  mean  time  to  complete  x  is  approximately  —  *  (1+— )x. 

po  u 

Combining  this  with  linearity,  it  follows  that  (1+  ^-)x  must  be 
the  exact  expression. 

We  can  in  fact  obtain  the  Laplace-Stieltjes  transform  of 
Ho(t,x).  Let  H0(z,x)  *  /“  e"zt  dt  HQ(t,x) 


Recall  that  HQ(t,x)  *  £Q(t,v)dv.  For  t  fixed,  fQ(t,v) 

is  a  pdf  having  an  impulse  of  wight  e’At  at  v  »  t;and  is  0  for 
v  >  t.  For  v  <  t,  it  has  a  term  (the  Bessel  function  part) 
which  we  call  gQ(t,v),  defined  for  v  <  t.  Then 

fe’Ax  I(t  >  x)  ♦  /*  gQ(t,v)dv  x  <  t 

H0 (*  »x)  -  \ 

(0  x  >  t 


Again  the  calculation  is  laborious  but  straightforward  so 
details  are  omitted.  The  answer  is 


H0(z,x) 


exp  ( - 


z(2+\+u) 
z  +  u 


x) 
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Now  we  may  consider  x  itself  to  be  random  with  distribution 
B(x).  In  terms  of  a  queue,  the  time  needed  to  complete  x  is  the 
effective  "service  time"  of  a  message  which  has  a  random  length 
distributed  at  B(x).  (If  the  message  arrives  when  the  system  is 
empty,  it  may  find  the  server  in  state  1.  In  this  case,  the 
service  time  will  have  an  additional  component  --  the  time  for 
the  server  to  return  to  state  0.)  The  Laplace-Stielt jes  trans¬ 
form  of  the  effective  service  time  starting  in  state  0  is 


/"  d  B(x)dx  l"  e*Zt  H  (t,x)dt  «  I*  H  ( z ,  x )  d  B(x)  * 
n  n  0  0  0  Z  +  P 


where  B(z)  *  e'zxd  B(x)dx.  The  mean  effective  service  time 

-0 


z«0 


(1  +  — )x  ,  where  x  *  /"  xd  B(x) . 
u  0 


This  is  consistent  with  previous  results.  If  B(x)  ##  exp(5) 
the  formula  for  the  transform  is  p ^ y+Yz 


Because  of  the  memoryless  property  of  the  exponential 
distribution,  we  can  derive  the  last  expression  more  directly. 
Let  Pi(t)  «  Pt[T  <_  t|i,  x***B(x)],  i  »  0,1. 

Then 


P0(t)  - 


+ 


/t  Xe'Av  [Pr (x  <  v)  +  Pr (x  >  v) Pr (T<t-v | 1 ,x>v) ] dv 
0 


/"  Xe'Xv 


Pr[x  <  t]dv 
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The  crucial  simplification  is  that  the  underscored  term  is  just 
P^(t-v),  because  of  memorylessness . 

P  (t)  -  !t  Xe'Xv[l  -  e'Cv  +  e’?v  P,(t-v)]dv  +  (1  -  e‘Ct)e'Xt 

o  0  L 

l^(t)  *  fz  ue*uv  «SQ(t-v)dv. 

A  A 

Taking  Laplace  transforms  and  solving  for  PQ(z)  yields  PQ ( z)  » 

. .  §  .  4  .  As  P  fz)  is  the  transform  of  the  distri- 

Cz+u)  •  (z+£J +Xz  z  o'-  J 

bution,  we  conclude  that  the  transform  of  the  density  is 
zPQ(z)  which  agrees  with  the  previous  expression. 

The  completion  time  analysis  for  exponential  length  messages 
can  be  extended  to  the  general  case  of  N  speakers  and  arbitrary 
service  rates  {r-}  .  Let  T- (£)  denote  the  completion  time  of  a 

X  x 

message  whose  length  is  exponentially  distributed  with  mean  £  1 
given  a  start  in  phase  state  1.  And  let  T(z,£)  - 

A  *  a 

-  (Tq(z,  £),...  ,T^(zf£))  denote  the  vector  of  Laplace-Stielt  j  es 
transforms.  Then  a  conditioning  argument  similar  to  the  one 
above  shows 

[*  I  ♦  SrD  -  £3  i(z,£)  -  £  r 

where  2  is  the  generator  for  the  speech  process,  and  r^  is  the 
diagonal  matrix  obtained  from  the  service  rate  vector  r. 

Using  the  transform  to  extract  moments  we  obtain  T(£)  * 
(T0(£),...,Tn(£))  «  (£rD  -  2)  "1  I^nd  T7(£)  «  (T 2q  (£) . tJ(£)) 

.  7 

-  2 (^£d  -  2)  ”  !•  If  a  message  initiates  service  in  a  state 
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chosen  according  to  the  speaker  equilibrium  distribution,  £,  then 

T  T  7 

its  mean  service  time  is  £  .  *  T(£) ,  and  the  variance  is  £  "T  (£) 

-  (p  •  T(C)J  •  It  is  possible  to  show  that  if  X,  u  «,  with  - 

—  r'P 

remaining  fixed  then  T(z,£)  approaches  — — —  1.  That  is,  the 

z  +  £r 

service  rate  effectively  becomes  deterministic  at  the  average 
—  T 

rate  r  ■  r  •  £,  so  that  for  any  approaches  an  exponential* 

with  mean  (£r)  ^  in  distribution.  This  is  because  the  speaker 
process  passes  through  its  states  "infinitely"  often  during  a 
service,  and  the  fraction  of  time  it  spends  in  state  i  approaches 
Pi* 

The  previous  characterization  immediately  generalizes  if 
we  seek  the  joint  distribution  of  completion  time  and  the  speaker 
state  at  completion.  Let  TjjCt,?)  »  Pr[message  is  completed 
within  time  t;  the  completion  state  is  j | initial  state  i]  and 

a 

let  Tjj(Z,£)  be  the  Laplace-Stielt jes  transform.  (Note  T^(t,0 

is  a  possibly  defective  distribution,  i.e.  d„  T..(t,£)  <  1, 

J  0  z 

since  completion  need  not  occur  at  j).  Then  a  conditioning 
argument  shows 

[Z  I+  <£D  ‘  ICZ.O  -  5rD  . 

A 

Note  that  T(0,£)  is  a  transition  probability  matrix  on  the 

a 

speaker  state  space  itself,  i.e.  £^j(0,£)  ■  Pr[message  completed 

A 

in  j | start  in  i] .  We  will  use  T(Z,£)  to  compute  waiting  times 
in  Chapter  VI. 
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V .  The  M/M  Case 

A.  Orientation 

In  this  chapter,  we  assume  that  data  arrivals  are  Poisson 
with  rate  n  ,  and  message  lengths  '•exp (5) .  r  ■  (r  , ...,rN) 
denotes  the  vector  of  service  rates.  With  these  assumptions, 
the  vector  process  [Aft),  K(t)],  where  K(t)  is  the  number  of 
messages  in  the  system,  is  a  Markov  process.  Its  state  diagram 
is  a  two-dimensional  grid  with  a  state  (i,k)  having  transition 
rates 

(i,k)  -*•  (i+l,k)  at  rate  A^  *  (N  -  i)  A 

(i,k)  -*•  (i-l,k)  at  rate  y^  »  iy 

(i,k)  ■*  (i,k+l)  at  Tate  n 

(i,k)  (i,k-l)  at  rate  if  k  >  0 


K«  o  i  .  .  .  Jk, 

Notice  that  the  last  transition  rate  has  units  of  messages/sec. 
(as  it  must)  and  depends  on  r^  and  5  only  through  their  pro¬ 
duct.  For  convenience,  we  take  5*1. 
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In  looking  at  the  diagram,  we  notice  that  if  a  column 
is  viewed  as  a  superstate,  the  "column  process"  is  a  multidi¬ 
mensional  analog  of  the  classical  M/M/1  birth-death  process  in 
that  the  transitions  into  and  out  of  any  column  have  the  same 
structure  (except  at  K*0  where  the  process  is  truncated) .  The 
voice/data  model  is  in  fact  a  special  case  of  the  more  general 
quasi-birth-death  (QBD)  or  column  continuous  processes.  These 
are  time-homogeneous,  discrete-state  ^bivariate  Markov  chains 
[X(t),Y(t)]  for  which 

1)  A  transition  (x^.y^  -*•  (x2,y2)  can  occur  only  if 
|y^  -  y2 1  <_  1,  whence  the  term  column  continuous. 

(Also  called  skip -free  left  and  right.) 

2)  The  rate  of  a  transition  within  a  column, 

(x^ , y)  •*  (x2,y),  is  independent  of  y. 

The  QBD  processes  for  which  the  inter-column  rates,  i.e. 
(x^y)  (x2',y±l) ,  are  also  independent  of  y  (except  at 

boundaries)  constitute  an  important  subset.  We  term  these 
homogeneous  QBD  processes  with  the  understanding  that  complete 
spatial  homogeneity  might  not  be  present  because  of  boundaries. 
The  voice/data  model  belongs  to  this  subset  and  is  even  more 
restricted  because  (x^,y)  (x2,y±l)  can  occur  only  if 

x-^  *  x2*  This  restriction  coupled  with  2)  shows  that  the 
marginal  process  X(t)  (in  our  case  A(t))  is  a  Markov  process. 

In  this  sense,  the  speaker  process  is  an  "independent"  phase 
process  "modulating"  the  transitions  of  K  (though  A  and  K 
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are  generally  not  statistically  independent) .  Notice  that  in 
the  general  QBD  process,  the  marginal  process  X(t)  is  not  Markov 
because  the  rate  of  transitions  (x^,y)  -*■  (x2,y±l)  can  depend 
on  y.  An  exception  to  this  is  the  subset  of  truly  homogeneous 
QBD  processes,  i.e.  those  in  which  the  state  space  of  Y  is  all 
the  integers.  For  in  this  case,  changes  in  X  are  independent 
of  Y. 

The  homogeneous  QBD  processes  in  general  can  be  viewed  as 
multidimensional  analogs  of  the  classical  M/M/1  birth-death 
process  (i.e.  even  if  (x^.y)  -►  (x2,y±l),  x^  #  x2  is  allowed). 

Thus  one  might  expect  ergodic  distributions  that  are  matrix 
analogs  of  the  geometric  or  truncated  geometric  ergodic  distri¬ 
butions  of  the  M/M/1  queue.  This  is  indeed  the  case. (Perhaps 
we  should  say  that,  a  posteriori, the  following  makes  sense.) 

Assume  [X,Y]  is  a  homogeneous  QBD  process  in  which  the  state 
space  of  X  is  (0,1,... ,N)  and  that  of  Y  is  I Q ,1 , 2 , . }  .  Let 
e(y)  *  (e0y» • • • .*Nj)  denote  the  vector  of  the  joint  ergodic 
probabilities  for  the  yt^1  column.  Then  we  will  show  that 

eT(y)  -  eT(0)  £y 

where  £  is  similar  to  a  strictly  substochastic  matrix.  (Hence 
£'s  eigenvalues  are  inside  the  unit  circle,  as  they  must  be 
for  this  to  be  meaningful.)  This  form  for  the  solution  is 
apparently  due  to  Evans  and  Wallace.  (Our  historical  infor¬ 
mation  was  obtained  through  Neuts  [12]  and  Keilson  [13]. 

For  more  references  and  a  more  complete  study  of  matrix -geometric 


methods  see  Neuts  [12].)  Independent  derivations  were  sub¬ 
sequently  obtained  by  Keilson  and  Neut#.  The  approach  was 
brought  to  our  attention  by  Keilson  who  was  working  on  similar 
Markov  models  in  other  contexts  when  we  brought  the  voice/data 
problem  to  him.  An  exposition  of  some  of  his  results  can  be 
found  in  [14] .  (In  this  paper,  the  roles  of  columns  and  row 
are  reversed  from  our  usage,  so  he  terms  them  row  continuous.) 

In  the  remainder  of  this  chapter,  we  explore  the  applica¬ 
tion  of  the  matrix  geometic  method  to  the  voice/data  model. 

As  indicated,  the  generals  theoretical  questions  have  largely 
been  solved.  However,  the  voice/data  model  is  a  restricted 
case,  and  we  have  been  able  to  characterize  certain  quantities 
in  some  detail.  To  maintain  the  continuity  of  development,  we 
do  not  always  explicitly  separate  those  theoretical  results 
that  are  particular  to  the  voice/data  model  (and  hence  new  to 
us).  Where  possible,  we  provide  references  for  previously 
known  results.  Undoubtedly,  we  have  overlooked  some  authors 
and  apologize  for  this.  But  the  queuing  theory  literature  is 
so  vast  and  diffuse,  that  a  complete  literature  search  is 
not  appropriate  unless  one  is  doing  a  survey  paper.  This 
was  not  our  intent. 

Our  development  will  use  Keilson' s  approach  as  a  theore¬ 
tical  guideline.  This  is  based  on  the  so  called  compensation 
method. (See  Keilson  [IS].)  Before  proceeding  with  this,  we 
briefly  discuss  the  z-transform  approach. 

One  first  defines  the  partial  z-transforms 


k 

F.  (z)  -  I  e..z  ,  and  then  using  the  global  balance  equations, 

1  k-0  1K 

(which  are  difference  equations  of  degree  2  in  both  the  i  and  k 
coordinates)  obtains  the  relation  F  (z)A(z)  *  e  (O)B(z),  where 
F(z)  ■  (F  (z) , . . .  ,Fjj(z))  ,  and  the  entries  of  A(z)  and  B(z)  are 
polynomials  in  z  of  degree  2  or  less.  Thus  F  (z)  «  e  (0)B(z)A  (z)  . 

The  matrix  A_1(z)  has  poles  in  the  region  |z|  <1,  and  one  can, 
in  principle,  solve  for  e(0)  by  using  the  requirement  that  F(z) 
must  be  analytic  for  { z |  <  1.  (The  analyticity  condition  places 
N  constraints  on  e(0).  The  (N+l)st  comes  from  a  "(Conservation 
equation. ) 

This  approach  shows  that  the  F^(z)  are  rational  functions  and 
hence  that  the  solution  is  a  sum  of  geometrically  decaying  terms, 
(The  decay  rates  must  be  eigenvalues  of  J3. ).  However,  one  cannot 
easily  extract  from  the  transform  solution  the  manner. in  which 
the  (possibly  complex)  roots  combine  into  a  real  matrix  repre¬ 
sentation  £.  Also,  the  numerical  inversion  of  the  z -transforms 
is  less  attractive  than  the  methods  to  be  presented. 

The  transform  approach  was  our  first  approach,  and  later  we 
found  that  the  same  results  had  been  obtained  by  Yechiali  and 
Naor  [16],  [17]. 
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B.  Stability  and  Existence  of  Ergodic  Distributions 

In  the  general  study  of  queues  and  backlog  processes  , 
"stability"  is  usually  present  iff  the  "service  rate"  exceeds 
the  "arrival  rate".  The  precise  mathematical  formulation  of 
these  notions  depends  on  the  particular  model.  Typically,  a 
weak  notion  of  stability  is  that  the  emptying  of  the  queue  or 
backlog  should  be  a  recurrent  event.  One  can  then  further 
require,  for  example,  that  the  waiting  times  converge  in 
distribution  to  a  random  variable  having  a  certain  number  of 
moments  etc.  For  many  classes  of  queuing  models,  the  various 
requirements  are  equivalent,  but  it  is  not  our  purpose  to 
discuss  this  general  problem. 

For  Markov  chain  queuing  models,  stability  means  that  the 
Markov  chain  is  ergodic.  For  finite  Markov  chains,  ergodicity 
reduces  to  the  purely  structural  question  of  irreducibility , 
but  for  infinite  chains  one  must  further  require  some  "net 
return  force  toward  the  origin".  In  the  case  of  Markov  queuing 
models,  this  translates  into  the  arrival  rate  <  service  rate 
condition.  The  voice/data  model  poses  no  structural  barriers 
since  there  is  a  path  between  any  two  states  provided  that  at 
least  one  >  0.  The  other  condition  is  satisfied  if  what  we 
term  the  drift  .  n  ■  Sp.  r.  *n  •  r,  is  negative,  and  we  assume 
this  throughout  the  development.  As  our  interest  is  in  computing 
various  statistics  and  not  in  "existence"  results,  we  do  not 
formally  prove  that  the  negative  drift  condition  is  necessary 


-83- 


and  sufficient.  At  various  places,  "plausibility  arguments" 
will  become  apparent,  e.g.  as  the  drift  approaches  0,  the 
maximal  eigenvalue  of  £  approaches  1.  For  a  formal  proof  and 
general  discussion  of  queues  with  dependent  interarrival  or 
service  times, the  reader  is  referred  to  Loynes  [18],  [19],  [20] ■ 
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C.  Derivation  of  Solution 

H 

First  consider  the  process  [A(t) ,  K  (t)]  obtained  by  re¬ 
moving  the  boundary  at  zero  and  extending  the  columns  out  to  -<*>. 
This  process  is  spatially  homogeneous  in  the  k  variables 
because  the  transition  rates  depend  only  on  A.  Because  of 

U 

homogeneity,  the  transition  probability  Pr(A(t)  »  j,K  (t)  * 
k^lAfO)  ■  i,KH(0)  ■  kQ]depends  only  on  k1  -  kQ.  We  take  KH  *  0 
as  the  origin  and  define  g(k,t)  by 

SijCk.t)  -  Pr[A(t)  -  j,KH(t)  -  k|AC0)  -  i, KH(0)  -  0] 

U 

The  asymptotic  behavior  of  K  (t)  as  t  ♦  •  depends  on  the  drift 
(n  -  it)  £.  If  the  drift  is  negative,  the  homogeneous  process 
drifts  to  KH  ■  -<■  . 

Define  the  k  step  right  first  passage  time  density  matrix 
s+(k,t)  by 

s^(k,t)dt  ■  Pr[KH*k  is  first  reached  at  t,t+dt  and  A(t)  » 

j  I A  CO)  -  i,  KH(0)  -  0] 

for  k  _>  1.  The  special  case  £+(l,t)  is  denoted  by  £+(t). 

Similarly  define  s'(k,t)  for  k  steps  left,  i.e.  k  £  -1.  Using 

conditioning  arguments,  one  can  show 

k  times 

s  +  (k,t)«  f(t)  *  s+7t) 

s’(k,t)«  s"(t)*...*  s * (t) 

£(k,t)  -  s  +  (k,t)  *  g(t)  k  >  1  ,£(t)  *  &(0,t) 

s' (k, t) .  g(t)  k  £  -1 

where  *  denotes  the  matrix-convolution  product, 
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k 


UCt).  kCOJi,  -  I  /*aikCt)  bkj(t-T)dt  . 

Let  o+(k, u>)  ,o  (k,u>)  ,  x(k,uj)  be  the  Laplace  transforms  of  £+(k,t), 

2  (k,t),  £(k,t)  respectively.  Further  let  o+(ui)  ^  a+(l,ui); 
a  (ai)  ^  a  (l,u);  xC^)  ^  X^»w^  and  let  o  +  ^  a+(aj*0);  a*  ^  a*(ou®0), 

A  f  OO  +  fch 

X  *  XC<*1*0) .  Note  for  example,  o+  ■  J  £  (t)dt  so  that  the  i 
row-sura  of  o+  is  the  probability  that  the  homogeneous  process  ever 
reaches  the  set  (0,1)}  given  that  it  starts  in  (i,0).  (The  set 
{ ( • , i ) }  denotes  the  set  of  states  in  the  column  KH*i.)  An 
analogous  interpretation  applies  to  o’.  Because  the  drift  is 
negative,  o+  is  strictly  substochastic  as  the~ process  may 
never  reach  KH  *  1  from  KH  *  0 .  Similarly,  o'  is  stochastic  since 
the  process  K^does  eventually  decrease , with  probability  one. 

In  the  transform  domain,  the  previous  relations  become 

2*Ck,<D)  -  [£+0)]k 

a‘(k,w)  -  [£'(u>)]k 

j[£+C^)]k  xCO.uj)  k  >  1 

X(k,w)  -  )  .  v 

\ [£  (w)]K  1(0, u)  k  <  -1 

Now  to  the  crux. 

Keilson  [15]  has  shown  that  if  a  boundary  is  inserted  in 

U 

the  homogeneous  process  at  K  ■  0  then 

1]  There  exists  a  "compensation  measure"  £(k)  ,  -<*>  <  k  <  <»  s.t. 

e(k)  is  a  convolution  in  k  of  x(k)  ar*d  _T(k)  .  (x(k)  = 

X(k,u)*0)  *  [  £(k,t)dt.) 

JQ  V, 
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Specif ically 

T  00  T 

eT(k)  -  Z  rT(Jl)  iCk-l)  k>0. 

£*-00 

2)  £(k)  *  £  except  at  k  •  -1  and  k  =  0  so  that 

eT(k)  -  rT(0)*(k)  +  £T(-l)£(k+l) 

3)  r(-i)  +  r(0)  -  o,  r(-i)  <  o  so  that 

eT(k)  -  £T(0)  CxCk)  -  £(k+l)] 

Using  the  previous  relations  we  then  obtain 

eT(k)  -  rT(0)[a+)k  -  Ca+)k+1]X 

-  rTCO)[I  -  £+](a+)k  X 

3  IT(°)U  -  2+lX  X‘1(£+)k  X 

*  eTCO) 9k 

where  eT(0)  *  £T(0)[£  -  £+]'Y  and  9  *  x  1  £+  X  •  Notice  that  £ 

is  similar  to  the  strictly  substochastic  matrix  a+.  Hence  the 

eigenvalues  of  9  are  strictly  inside  the  unit  circle.  This 

*  0 

implies  that  the  series  Z  £  is  convergent  and  equals 

.  1  Z**0  ao 

[£-£]  .  From  the  relation  E  e(k)  *  p,  we  then  obtain 

k-0~ 

T  T 

e  (0)  *  £  CJL  -9)  •  Thus  we  do  not  need  to  explicitly  find  the 
compensation  measure  £(0)  if  £  can  be  found.  At  this  point 
we  need  to  find  £  and  o+.  It  turns  out  that  £  is  easily 
obtainable  once  c  +  and  o  are  known,  so  we  first  proceed  with 
2*  and  a  . 


nit 


‘at'jum. 


Remark:  The  compensation  technique  also  applies  if  K  has 

a  boundary  at  some  positive  integer,  say  M.  In  this  case,  the 
compensation  measure  also  has  mass  at  k  *  M  and  k  *  M+l. 

Thus  one  obtains  a  term  of  the  form  rT(M)[x(k-M)  -  x(k*M'l)]> 
and  this  can  be  expressed  in  terms  of  x  and  powers  of  o'. 
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D.  Calculation  of  a+  and  j~ 

For  the  homogeneous  process  define  the  matrix  a  by 

H  H 

*  PrfK  departs  k  ■  0  by  going  to  K  *1,  A  *  j  when 

this  occurs  A(0)  =*  i 

KH(0)  «  0 

Similarly  define  £  forgoing  to  K  *-l.  Notice  the  distinction 

between  a  and  o+  and  £  and  o  .  The  o's  are  first  passage 

H  H 

probabilities  whereas  a  and  £  are  the  K  *+i,  K  =-l  probabili¬ 
ties  when  KH  leaves  0  (either  to  KH  =*+l  or  *  -1)  .  If  either 
_n  t  0  or  r  f  0,  then  a  +  £  is  stochastic  since  the  process  must 

U 

eventually  leave  K  a0.  The  matrix  a  +  j3  is  in  fact  a  transition 

probability  matrix  (on  the  speaker  state  space)  for  the  embedded 

H 

discrete  time  chain  defined  at  the  instants  of  changes  in  K  . 

As  one  suspects,  there  is  a  relationship  among  the  a,  £,  and 
o' s.  Specifically 


o+  *  o  +  ji(o+)2 
-  2 

a  *  J3  +  a  (a  ) 

The  first  equation  says  that  the  first  passage  to  K  a 1  may  occur 
a)  at  the  first  departure  from  K  *0,  the  a  term,  or  b)  if 
the  first  departure  is  to  K  *-l,  reflected  by  the  j3  term,  then 
two  first  passages  to  the  right  are  required  to  reach  K  =  +l. 
Because  of  homogeneity,  the  transition  from  -1  to  0  has  the 

same  probabilistic  structure  as  the  one  from  0  to  1 .  Thus  the 

+  2  -  •> 

(o  )  appears.  A  similar  interpretation  applies  to  (2  )“. 
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2 

The  iteration  m  iL  +  &  £q“°  (°r  reverse  SL  an<i  &  f°r 

o’)  does  converge  to  the  probabilistically  correct  solution.  This 
algorithm  is  discussed  later. 

The  matrices  ct  and  £  can  easily  be  found.  By  a  conditioning 
argument 


aiJ 


v. 


i  i+lfj 


°i-l, j 


6  -  . 
Vi  « 


where  \k  »  r^  *  total  exit  rate  out  of  a  state  (i,*) 


and  6 


ij 


Kronecker  delta.  In  matrix  form  these  equations  are 


2,  *  C^o  +  Ld  -  fl)-1  % 


where  £  is  the  generator  for  the  speaker  process.  Similarly  * 
f  (Up  +  £D  -  &)"1  £p*  We  can  put  these  matrices  into  a  form  which 
more  clearly  indicates  their  probabilistic  meaning.  Let 

A 

Vjj  -  (vi  6^.)  and  let  Q  -  Q  ♦  diag(Xt  +  uA)  .  Then  we  can  write 
the  equations  as 

a  -  U  -in'1  a)-1  Jio'1  a„ 

t  -  (1  •  Id'1  2) -1  Sd’1  £d 


Essentially,  we  are  considering  the  embedded  discrete  time 

- 1  ~ 

chain.  The  matrix  ^  is  a  strictly  substochastic  matrix 

U 

giving  the  probabilities  that  when  a  transition  occurs,  the  K 
coordinate  does  not  change.  Analogously  Vp*1  £D  and  Vp1  rD 
give  the  probabilities  that  when  a  trasition  occurs,  K  goes 

.  I  * 

to  *1,  -1  respectively.  Since  £  is  strictly  substochastic. 
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-1 


U  -  A  21 


The  probabilistic  meaning 


"  2  Cvc'1 

£■0 

of  these  equations  as  accounting  identities  is  as  follows.  A 

U 

departure  to  K  »+l,  for  example,  occurs  when  there  are,  say, 

LI 

l  transitions  at  which  K  does  not  change-*  this  is  reflected 

by  the  (v^  £)  term-- followed  by  a  departure  to  Kn  * +1  --for 

which  Vj)"1  flj)  is  the  appropriate  transition  matrix.  A  simple 

.  1  ~  - 1  .1 

calculation  will  indicate  that  S  +  £  *  (i  “  ^  £)  0lD+rD) 

is  stochastic. 

So  far,  we  have  not  really  relied  on  the  particular  proper¬ 
ties  of  the  voice/data  model,  except  in  using  the  relation 


E  e(k)  ■  p  to  express  e(0)  in  terms  of  9  and  p.  The  procedure 
k-0  ~ 

in  fact,  does  work  for  general,  homogeneous  QBD  processes.  In 

the  general  case,  £  is  the  generator  for  some  Markov  chain 

defined  on{0,l . N},  and  tj^  and  jp  are  replaced  by  general 

transition  rate  matrices  n  and  r,  e.g.  tv  2  is  the  rate  at 

11’12 

which  (i^.k)  •*  (i2,k«-l)  transitions  occur.  We  later  show  that 
£(0)  can  be  characterized  as  a  left  PRF  eigenvector  of  a  sto¬ 
chastic  matrix  in  the  general  case,  and  e(0)  is  then  uniquely 

specified  by  the  requirement  the  E  e.  -  1  or 

i,k  Lk 

T  - 1  m 

e  (0)  [1-9]  1  ■  1.  But  as  indicated,  the  relation  E  e(k)  *  p, 

"  "  k*0~ 

where  £  is  the  stationary  distribution  of  the  £  matrix  in 
question,  is  generally  not  true  since  changes  in  the  marginal 
"row  process"  are  not  totally  accounted  for  by  this  Also, 

T 

the  simple  drift  criterion  n-  r  £  <  0  is  replaced  by  a  more 


general  condition. 

For  the  voice/data  model, we  can  prove  the  following 
assertions,  assuming  n  >  0,  r  >  £. 

1)  a  >  0  S  >  0 

2)  cr+  >  £  o "  >  0 

3)  The  eigenvalues  of  a,£  and  a  +  £  are  real  and  positive 

4)  The  eigenvalues  of  o+  and  o’  are  real  and  positive. 

Proof: 

1)  In  the  speaker  chain,  any  state  A  *  j  is  reachable 
from  any  state  A  *  i.  If  n  >  0,  then  the  transition 
(j,0)  -►  (j,l)  can  occur  V  j.  So  the  sample  paths  that 
start  in  (i,0),  wander  in  the  set  ((*,0)}  until  j  is 
reached,  and  then  go  to  (j,l)  at  the  next  transition, 
have  positive  probability,  i.e.  a^.  >  0,  V  i,j. 
Similarly,  r  >  £  implies  £  >  £. 

2)  From  the  matrix  quadratic  equations  for  the  o's,  it 
follows  that  a+  £  a,  a  >  so  2)  follows  from  1) . 

Remark:  It  follows  from  1)  and  2)  and  our  discussion  in 

Chapter  II, that  a,  £,  a*,  and  o'  are  primitive.  Properties  1) 

and  2)  are  not  too  special.  In  the  general  homogeneous  QBD 

process,  they  will  follow  from  the  irreducibility  of  the 

transition  structure  within  a  column  and  the  condition 
T  T  T  T 

1^  2  >  2.  and  _1  I  >  £  ’  where  £  and  r  are  the  general  rate 

T  T 

matrices.  The  condition  l  ^  >  simply  means  that  for  each  j, 
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there  is  some  i  s.t.  an  (i,0)  (j,l)  transition  is  possible. 

(Similarly  for  r.) 

3)  We  use  the  following  lemma. 

Lemma:  If  and  are  two  positive  definite  matrices 
(symmetry  is  included  in  our  definition) ,  then  the  eigen¬ 
values  of  the  product  M2  are  real  and  positive. 

(Note  M2  need  not  be  positive  definite.) 

Proof:  M2  x  »  ex  implies  M2  x  »  e  M^x,  which  implies 

*T  #T  -1 

x  M2  x  *  ex  x,  where  *  ■  complex  conjugate.  Now, 

positive  definite  implies  M^1  positive  definite,  so  by  the 

*T  #T  -l 

definition  of  positive  definite  x  M2  x  and  x  Mj  x  are  real 
and  positive.  Thus  e  must  be  real  and  positive. 

(If  either  Mj  or  M2  is  only  semi-definite,  a  similar  result 
holds  with  >  replaced  by  >_,  though  the  proof  is  more  complicated.) 

Now  let  A  -  (Up  +  Ip  •  2)  and  L  *  Ep*  £  Ep  *  where  is 
the  diagonal  matrix  obtained  from  the  speaker  ergodic  distri¬ 
bution  £.  By  Girshgorin's  theorem,  the  eigenvalues  of  A  have 

A  A  1 

positive  real  parts,  and  hence  the  eigenvalues  of  A  and  A 
must  also.  By  our  remarks  on  detailed  balance,  £D^  £  Ep 

A  A  * 

is  symmetric,  and  hence  A  and  A"  are,  since  £D  and  rp  are 

A  - 1  *  1 

diagonal.  Thus  A  is  positive  definite.  Now  ot  *  A  ^  which 

implies  gjj*  a£D2  -  -  A*1  *  (A)"1  £D  and  similarly 

for  S  ■  A  1  £d  and  a  +  £  *  A*1^  +  Ip)*  By  assumption, 
n  >  0,  r  >  0,  so  2d,  Ip*  and  np  +  rD  are  positive  definite. 


-93- 


Invoking  the  lemma,  we  can  conclude  that  a,  £  ,  and 
a  +  £  are  similar  to  matrices  having  real  positive  eigenvalues. 

4)  We  only  prove  the  result  for  o+  since  the  proof  for 

o"  is  the  same  with  rhe  roles  of  a  and  £  reversed.  For 
convenience,  we  drop  the  "+"  from  o+. 

2  ******  2 

By  previous  remarks,  £-a  +  £a  ,  soo-a  +  £  a  where 
—  _  1  1  ~ 

M  *  ED  M  £jj  ,  for  M  -  a,  £,  or  £.  Now  suppose  x  is  an 
eigenvector  of  a  with  eigenvalue  e.  Then  we  obtain  ex- 
a  x  +  £  x.  Recall  from  the  proof  of  3)  that  a  -  (A)' 

a  i  a 

and  j}  *  (A)"  rjj,  where  A  is  positive  definite.  Thus  ex- 

^  -1  2  *  2 
(A)  (r^D  ♦  e  rD)  x,  which  implies  e  A  x  *  (r^  +  e  rD)  x. 

*T 

Multiplying  both  sides  on  the  left  by  x  ,  we  obtain 
2  *t  *  *T 

C1  £ *  co  +  c2  6  »  where  C1  *  i  fli  £*  c2  *  £  — D  — J  Co  *  n ‘ 

/\ 

(We  assume  x  has  unit  norm.)  From  the  definition  of  A  it 

*T  1  - 1 

follows  that  ci  *  c0  +  c2  +  c3?  where  c^  ■  -x  p^2  £  ]>D  1  x. 
From  our  remarks  on  detailed  balance,  £  Ed *-s  sy,mnetric* 
Since  £  has  nonpositive  eigenvalues,  it  follows  that  c3  >  0. 
Further,  cQ  and  c2  are  positive  since  n  >  0,  r  >  0.  A 
straighforward  application  of  the  quadratic  formula  then  shows 
that  both  roots  of  the  polynomial  must  be  real  and  positive. 

+  k 

Remark:  This  indicates  that  (a  )  will  not  exhibit 
"oscillatory  behavior"  as  k  ♦  ». 


1 
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E.  Calculation  of  ^ 

The  transition  probability  matrix  gt) ,  g^ . (t)  ■  Pr[A(t)  -  j, 
KH(t)*  0 | A(t)  -  i,KH(0)  »  0]  satisfies  integro -differential 
equations 

i(t)  -  &(t)[Q,  -  3o  -  Id3  +  (&(t)rD)*s+(t) 

£(t)  -  £(t)[£  -  flo  -  Iq]  *  s'CtJ-C&Ct)^)  +  s+(t)*(£(t)rD) . 

(*  ■  matrix -convolution  product) 

These  equations  can  be  derived  using  the  conditioning  arguments 
relating  g(t+dt)  to  g(t) .  In  both  equations,  the  first  term 
comes  from  the  fact  that  (j,0)  can  be  reached  at  t  +  dt  by 
being  in  (j-1,0),  (j,0),  (j*l,0)  at  time  t  and  then  having  a 
transition  up,  no  transition,  transition  down,  respectively. 

The  matrix  [2  '  "  £q]  incorporates  these  possibilities. 

The  other  terms  are  in  general  different  in  the  two  equations 
since  matrix  multiplication  (and  hence  *)  is  not  commutative. 

The  two  equations  derive  from  different  conditioning  arguments. 

We  sketch  the  two  for  the  "4  £_"  term. 

In  the  first  equation,  "£  s'"  is  obtained  as  follows. 

We  wish  to  compute  g^(t+dt).  One  "set  of  paths"  starting  in 
(i,0)  at  0  and  reaching  (j,0)  at  (t+dt)  is  the  following.  At 
some  intermediate  time  t  -t  ,  the  process  is  in  state 
with  probability  gi4(t-T).  In  the  next  dt  it  g°es  t0  U.l) 
with  probability  ndt.  Then  the  first  return  to  the  set 


’t  Vr.  - 


L  . 
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{ (• ,0) }  occurs  x,x+dx  later,  and  the  entrance  state  is  (j,0), 
with  probability  s'  (-r)dT  .  Therefore  g. . (t+dt)  contains  a  term 

^  J  1 J 

of  the  form  Z  ndt  /  g; (t-x) si . (x)dx  so  that  g.. (t)  contains  a 

l  J  Q  X‘J  XJ 

term  nl  g.,  *  sz4.  Note  that  there  is  no  double  counting 

because  s'^  is  a  first  passage  time  density. 

In  the  second  equation,  the  term  is  obtained  as  follows. 

The  state  (j,0)  can  be  reached  at  t  +  dt  if  the  process  is  in 
(j,-l)  at  time  t,this  occurring  with  probability  gi ^ (-1 ,t),  and 
then  a  transition  to  (j,0)  occurs  (with  probability  ndt).  But 
from  previous  results  g(-l,t)  -  £"(t)*£(t). 

Transforming  the  first  equation  yields 


-  i  »  xQiHQ,  *  Hd  -  iDl  +  x(»)[3j)£(“)  ♦  id£+  (<*>)] 

As  Y  *  y(w«0) ,  we  conclude, 

I  ■  UiD  +  Id  *  2  -  aDa‘  *  IdsV1 

The  existence  of  £,  i.e.  the  invertibility  of  the  matrix  in 

in  brackets,  can  now  be  shown.  Let  ■  n+  +  (N-i)x  +  iu 

Vjj  *  diag(v^).  Then  the  matrix  in  question  can  be  rewritten 

-1  A  ~ 

as  vD[I  -  B]  where  B  *  (£  +  jjpO  ♦  rDa  )  and  2  *  2  + 


diagonal  ((N-i) X  *  iy) .  B  is  an  irreducible  nonnegative  matrix, 
and  because  a*  is  strictly  substochastic,  B  is  also.  Thus 


Sp(B)  <  la$  I  -  B  is  invertible. 


Now  y .  .  -  /“g-.ft)  -  E/7  dtI(KH(t)  -  0 ,  A(t)  -  j[KH(0) 

A(0)  a  i)l  ,  where  If  )  is  the  indicator  function.  That  is  y  . 

J  '  lj 
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U 

is  the  expected  amount  of  time  that  [A(t) ,  K  (t)]  spends  in 
(j,0)  given  a  start  in(i,0).  With  negative  drift,  KH(t)  -►  -• 
a.s.,  so  that  the  column  1C ■  0  is  transient.  The  existence 


of  £  shows  that  g^.  (t)  goes  to  0  rapidly  enough  to  be  integrable. 

From  this,  it  follows  that  ^(w)  *  / 

•'0 

as  well  for  Re  u  >  0. 


e"ut£(t)dt  is  finite 


♦ 
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F.  Alternate  Characterization  of  e (0) 

For  the  bounded  process,  let  f(t)  be  the  transition 
probability  matrix  for  the  K  ■  0  column 

f..(t)  -  Pr(A(t)  -  j,  K(t)  -  0 1 A(0)  -  i,  K(0)  »  0]  . 

If  the  drift  is  negative,  the  column  k  *0  is  recurrent  in  the 
bounded  process,  and  fy  (t)  e^g  as  t  ♦  »,  independent  of  i. 
f(t)  satisfies  the  followed  differential  equation  which  can  be 
derived  similarly  to  the  one  for  £(t) . 

i(t)  -  £(t)[a  -  ajj]  ♦ 

Note  the  r^  terms  are  not  present  (except  through  (t))  because 
of  the  boundary.  Also,  for  the  bounded  process,  the  one  step 
left  (i.e.  k  «*»  k-1)  first  passage  time  density  for  any  k  >_  1 
is  the  same  as  that  in  the  homogeneous  process  because  neither 
is  bounded  above.  Thus  the  ^’(t)  from  the  homogeneous  process 
is  the  correct  quantity  to  use  in  the  above  equation.  It 
follows  that  f («°)  *  1  e  (0)  and  £  (0)  is  a  left  eigenvector 

iT(°m  *  aD  +  ap*’1  *  iT. 

From  the  discussion  on  uniformizing  a  chain,  e(0)  is  also  a  left 
eigenvector  with  eigenvalue  1  of  the  matrix 
(I  +  +  ^  (a°  ♦  a+)  -  Av  where 

v  £  max(n  ♦  (N-i)A  ♦  iu) .  Since  o"  is  stochastic,  the  matrix  A^ 
is  stochastic  and  is  a  transition  probability  matrix  for  the 
discrete  time  chain  embedded  at  clock  "bong"  instants  when  the 
process  enters  a  state  in  the  column  K  *  0.  That  is  (Av) ^ 
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is  the  probability  that  at  the  next  clock  "bong”  for  which 
the  chain  enters  a  state  (*,0),  it  enters  (j,0),  given  a  start 
in  (i,0).  Note  that  this  may  be  the  very  next  clock  bong, 
with  probabilities  given  by  the  a°  term.  (Again  (a®)^  >  0  if 
v  >  n  +  (N-i)X  +  iy  so  (i,0)  itself  can  be  the  "entrance"  state 
at  the  next  bong.)  Or,  at  the  next  clock  time,  the  chain  may  go 
to  (i,l),  with  probability,  in  which  case  the  state  of  return 
to((*,0))is  determined  by  the  £^j's. 
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G.  Algorithm  for  Computing  a*  and  a* 

(Much  of  this  discussion  is  due  to  Zachmann  [21]  and 
Neuts  [12] . ) 

Let  SQ  be  the  set  of  substochastic  matrices  (A  1^  <_  1_) , 

S.^  the  set  of  stochastic  matrices,  [A  1  *  1)  ,  and  S2  the  set 

of  strictly  substochastic  matrices  (A  1  <  1  Note 

SQ  3  Sj  u  S2,  and  the  containment  is  proper.  We  now  study 

2 

the  equation  a  •  a  +  £  a  where  £  e  SQ?o  >  £  >  0, 

a  +  jJ  eS^.  (Technically,  the  assumption  a  >  0,  £  >  0.  could  be 

replaced  by  a  >  0,  j3  _>  £,  and  weaker  assuptions  about  the 

irreducibility/primitivity  of  a  or  £  or  even  just  a  + 

depending  on  the  result.  For  our  purposes,  such  technical 

details  are  of  little  added  value,  so  we  assume  a  >  0,  £>().) 

2 

Let  f  be  the  function  f(o)  *  a  +|£  .  Note  that  if 
o  eS^,  then  f(o)  eS^,  i  ■  0,1,2.  By  Brouwer's  Fixed  Point 
Theorem,  f  has  at  least  one  fixed  point  in  is  compact, 

and  f  is  continuous.)  We  investigate  other  solutions  of 
probabilistic  significance. 

Consider  the  iteration  *  f^^).  We  denote  k.  applications 

V  If 

of  the  iteration  by  A  (£)  so  *  A  (o^)  .  Let  v  denote  the  left 

eigenvector  of  the  stochastic  matrix  ot  +  J3, associated  with  the 

T 

PRF  root  1,  s.t.  tt  _1  =  1.  Recall  that  a  +  £  has  an  inter¬ 
pretation  as  a  transition  probability  matrix  of  an  embedded 
chain,  so  it  is  its  stationary  distribution.  The  convergence 
properties  of  the  algorithm  are  as  follows: 
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If  £0  *  0  then  the  algorithm  converges  to  a  limiting 
matrix  A°°(J))  £Sq)  and  A“(J3)  is  a  fixed  point  of  f. 

T  oo 

2)  If  £  (a-J3)l  >  0  then  A  (£)e  and  is  the  unique  fixed 
point  of  f  in  SQ.  (Thus  it  is  the  one  guaranteed  by 
Brouwer's  theorem.)  Further,  for  any  aQe  SQ , A*” Co_0)  *  a" (0) . 

T  00 

3)  If  £  (SL"£)A  <  0»  then  A  (£)  is  the  unique  fixed  point 
in  S2*  (If  £q  J1  0,  the  algorithm  need  not  converge  to 
the  desired  strictly  substochastic  solution.  For  example, 

V 

if  ^  is  stochastic  then  A  is  stochastic  V  k.) 

We  now  sketch  the  proof  of  these  properties . 

1.  One  can  show  that  Ak+1(0)  >_  AkQ0).  Since  f(o)E  SQ 

V 

if  <^e  SQ,  the  sequence  A  (J))  is  bounded  above,  componentwise. 
Thus  it  must  converge  componentwise  to  a  limit.  This  limit 
is  clearly  a  fixed  point. 

k  k 

2.  One  can  also  show  that  for  any  o^e  SQ,  A  (o^  _>  A  (0). 

oo  V 

Thus  if  the  limit  A  (0)  is  stochastic,  A  (c^)  must  also 

converge  to  A°°(£).  In  particular,  if  is  a  fixed  point, 

then  a  *  A°°(a_)  ■  A°°(0)  ,  so  A°°(0)  is  the  unique  fixed  point 

T 

in  SQ  if  it  is  stochastic.  The  hypothesis  tt  (a-£H  >  0  is 
needed  to  prove  A^CO)  stochastic. 

T 

The  quantity  <$  *  n  (a-j3)_l  also  has  an  interpretation  as 
a  drift.  The  i^  component  of  (£-£).!  is  the  difference  between 
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the  probabilities  of  going  from  (i,k)  •*  {(*,k+l)  and  going 
from  (i,k)  ♦  {(*,k-l)>  .  The  jr  appropriately  weights  the  phase 
state  i  at  which  the  departure  from  the  column  occurs. 

The  condition  5  <  0  is  the  requirement  for  the  ergodicity 
of  the  general,  homogeneous  QBD  process  with  boundary  at  0. 

For  the  voice/data  model,  it  reduces  to  the  previous  stability 
condition  n  <  £  r . (This  was  pointed  out  to  us  by  Humblet  [22].) 
We  now  show  this.  II  is  the  ergodic  distribution  of  the  discrete 

U 

time  chain  embedded  at  epochs  when  K  changes.  For  the  voice/data 
model,  A(t)  does  not  change  when  ^changes, so  by  our  remarks 
in  Chapter  II,  n  is  proportional  to  (t^d  ♦  rp)£,  i.e.  when 
A(t)  «  i, changes  in  ILH  occur  at  rate  ri,  so  ni  is  pi  weighted 

PiCn+r.) 

by  this  factor.  With  normalization,  TI-  *  -  .  The 

1  SPiCn+r.) 

T  T 

condition  n  (a+ji)  *n  implies 

nT(I  -  (Ho  +  Ip)*1  2)(£d  +  ID)’l(HD  +  ID)  ■  HT»  which  implies 


nT(I  -  Can 
IT(Hd  + 

+  Id3 

2pi(n-ri) 

2pi(n+ri) 


- 1  T 

+  rD)  2)  58  II  •  (Note  this  equation  implies 

-1  T  .  .  .  T 

2*0,  which  implies  n_  is  proportional  to 

as  reasoned  earlier .  )Thus  .IT1  (a-S)l^  =  ^(r^+rp)*1  _r  j  1 


n-  £T  r  T 

- w —  .  Thus  6  <  0  iff  n<.£  r. 

n  +  £  I 


Mathematically,  5  enters  the  convergence  proof  as  follows. 
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Let  g(x)  ^  sp(a/x  +  x£}_,  x  >  0.  Since  a  +  £  is  stochastic, 
g(l)  *  1,  g(x)  is  strictly  convex #  and  g'(l)  *  *<5  •  The  assertion 
g'(l)  »  -6  can  be  verified  by  direct  computation  assuming  that 
g(x)  and  the  left  PRF  eigenvector  L(x)  (which  we  take  as  a 
row  vector)  are  differentiable.  Let  M(x)  *  a/x  +  x£  .  Note 
a > 0  (ot  ji  >  (3)  implies  M(x)  primitive  V  x  >  0.  Thus,  the 
eigenvector  L  (x)  can  be  chosen  positive  and  normalized.  We 
assume  that,  with  this  consistent  choice,  L(x)  is  differentiable. 
Then  L(x)  M(x)  ■  g(x)  L(x)  implies  L'(x)  M(x)  +  L(x)  [J3-o/x2J  ■ 
g'(x)  L(x)  +  g(x)  L' (x) .  Note  J4(l)  ■  a+£  implies  L(l)  ■  nT 
and  M(l)  1^  ■  1.  Substituting  x  *  1  into  the  equation  and  taking 
the  inner  product  of  both  sides  with  1^  shows 

L'(l)  1.  +  nT(£  -a)  1^  »  g'(l)  ♦  L'  1.1)  ‘l  which  implies  g'(l)  *  -$  . 
See  Kingman  [23]  for  the  convexity  proof. 

Now  let  a  >  0  be  any  fixed  point  (not  necessarily  in  SQ) 
and  let  r  *  sp(o) .  Note  that  o  ^  a  >  £  so  o  is  in  fact  ir¬ 
reducible.  By  the  discussion  in  Chapter  II,  the  matrix 

D_1a  D 

—  is  stochastic, where  D  is  the  diagonal  matrix  obtained 
from  the  PRF  right  eigenvector  of  o.  Thus  the  identity 

D'1  o  D  D_1a  D  ,  D"1^  D  - 

- -=-=  .  ~  ♦  r  D*1*  D( 

shows  that  a/r  +  rB  is  similar  to  a  stochastic  matrix.  This 
implies  sp(a/r  +  r£)  *  1.  Thus  we  have  a  characterization  of 
the  PRF  root  of  a  nonnegative  fixed  point.  If  5  >  0  then 
g'(l)  <  0.  Together  with  convexity  this  shows  that  r  ■  1 
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is  the  only  value  of  r  <  1  s.t.  g(r)  *  1.  Thus  if  ae  SQ  its 
spectral  radius  must  be  1.  (For  any  M  e  SQ,  sp(M)  <  1.)  From 
the  discussion  on  PRF  roots  (specifically,  the  row  sum  bounds 
on  r) ,  the  only  iTTeducible  substochastic  matrices  that  have 
spectral  radius  one  are  in  fact  stochastic.  Thus  if  a  e  SQ 
and  a  is  a  fixed  point,  a  eS^ 

3)  If  6<  0  then  g'(l)  >  0. Since  a  >  j),  g(r)  -►“as  r  -►  0. 

Together  with  convexity,  this  shows  that  there  is  exactly  one 

value  of  r  <  1  s.t.  g(r)  *  1.  This  allows  for  the  possibility 
(though  by  itself  doesn't  prove  the  existence  of)  another 
fixed  point  in  SQ,  with  spectral  radius  <  1.  The  matrix  A°“(£) 
can  be  shown  to  be  such  a  fixed  point,  and  A*(oj  is  in  fact 
strictly  substochastic  and  is  the  unique  fixed  point  in  S2  . 

Let  us  apply  this  to  the  voice/data  link.  For  the  queue 

to  be  stable, 5  must  be  negative.  In  this  case,  the  iteration 

«  +  J3  a£,  £0  *  0^  converges  to  a+  and  is  strictly  substochastic. 

The  other  fixed  point  (by  Brouwer's  Theorem)  has  no  apparent 

probabilistic  meaning.  For  the  iteration  8  *a  o£,  the  quantity 
T 

tt  (j3-a)l  is  positive,  so  this  converges  to  a  stochastic  solution 
which  is  a".  As  mentioned,  the  starting  point  in  this  case  is 
not  important^  and  o'  is  unique  in  SQ. 

Extending  the  probabilistic  reasoning  used  to  derive  the 
fixed  point  equation, we  see  that  o+  is  the  sum  over  all  paths 
that  are  "loops  followed  by  an  alpha".  Specifically,  o+  is 
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a  sum  over  terms  of  the  form 

[^l  ah  ....  aJ'L]a 

L  L  MM 

where  £  i.  ■  E  j .  and  for  any  M  <  L,  E  £.  <  Z  j  5  .  Such 
£«1  £*1  *  £-1  *  £-1  * 

a  term  corresponds  to  a  sample  path  in  which  the  first  i^  changes 

U 

in  the  K  coordinate  are  negative,  the  next  positive,  etc. 
until  the  final  jL  changes  bring  it  back  to  0,  and  then  a 
transition  to  + 1  occurs.  With  each  step,  the  algorithm  computes 
more  of  these  terms.  For  example 

a  ■  0 

— o  *■ 


*1 

3-1 

% 


a  +  1  a2 

a  +  £  a2  +  £  a  &  a2-  +  £2  a3  +  £2  a2  £  a2  • 


It  seems  difficult  to  find  any  characterization  of  the  probability 
mass  which  is  still  uncounted  after  the  kth  iteration  --  i.e. 
what  is  the  convergence  rate.  This  is  a  real  problem.  For 
o',  we  know  that  the  final  answer  is  stochastic.  Thus  one 
has  an  absolute  test  of  convergence.  For  o+  we  have  no  such 
test  yet.  One  can  look  at  the  successive  componentwise 
differences,  but  it  seems  difficult  to  relate  this  "stepsize" 
to  the  true  distance  from  the  limit.  This  problem  needs  more 
investigation. 

Finally,  we  discuss  another  algorithm.  In  the  equation 

2  -1 
£  1  a  +  |  a  one  can  "solve"  for  £,a  *  [i  *  £  a]  a,  and 


this  leads  to  another  iteration  in  an  obvious  manner.  To  see 


the  difference,  consider  the  result  of  the  second  step,  when 

Oq  ■  £•  Then  ■  [J,  -  £  a]  *  a  ■  Z  (6  a)  ^  a.  In  this  case 

£-0  “  ” 

the  algorighm  is  counting  all  prefixes  of  the  form  £  a  ....  ji  a 
followed  by  an  a.  It  is  unclear  as  to  when  one  algorithm 
"counts  faster"  than  the  other.  One  might  even  be  able  to 
devise  a  hybrid  procedure. 


-106- 


H.  An  Example 

To  make  some  of  this  more  "concrete",  we  work  out  the 
details  of  the  case  in  which  r^  ■  r  V  i.  This  problem  is 
rather  trivial  in  that  A(t)  and  K(t)  are  now  statistically 
independent,  and  K(t)  is  just  the  classical  M/M/1  birth-death 
queuing  process  with  parameters  n,  T  and  utilization  p»n/r  <  1 
For  this  queue  (see  Kleinrock  [24]), the  ergodic  probability  of 
k  customers  in  the  system  is  (1  -p)p  ,  k  »  0,1,...  Thus, 

If 

for  the  vector  process,  eik  -  pi  (1  -  p)p  ,  or  e(k)  »  £(l-p)p 
From  the  previous  discussion,  we  know  that  e1  (k)  *  jr  [£  -  e]ek 
and  one  might  suspect  that  e  is  simply  p£.  This  is  not  the 
case.  In  fact,  we  have  shown  that  o+  is  primitive,  and  hence 
p  (which  will  .turn  out  to  be  the  PRF  root)  is  a  simple  eigen¬ 
value.  Thus  9_  t which  is  similar  to  o+,  cannot  be  p£. 

First  we  recall  the  following  from  Chapter  III. 


•  The  Q  matrix  for  the  speaker  process  has  eigenvalues 
■  -i(X+u)  i  »  0,...N. 


2  is  diagonalizable ,  i.e.  2  *  ^ 
row  of  L  is 

,  _  t-1 


a-'-V  '1] 


(1  -  e) 

(*  *  convolution) 


where  the  ith 
*  ^/u  ,  and 


•  £  is  proportional  to  L(0),  specifically,  £ 
or  £  -  L(Q)  •  (1  +  e)'N/2 


IL-il 


(1  ♦  e)‘ 


From  the  previous  sections  in  this  chapter, 
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a  •  (  (ti  +  r)£  -  2)*1  n  a*  *  o  +  £(o+)2 

1  •  C  Cn  ♦  r)^  -  2)"1  r  a*  -  £  +  a(o')  2 

X»((n*r)£-£-r0+-  na’)*1  1  “  l"1  £+  X 

where  now  rD  reduces  to  rl  . 

a 

Now  define  M  *  ^  M  ^  where  M  is  any  of  the  previous 
matrices.  Set  J)  "  [  Gn +  r)^  -  £]  1.  £  -  diagonal  CS so 

0  is  diagonal.  Then  a  simple  computation  shows  ci  »  J)n  , 

*  _  *  -.1  _  A‘  +  ^  ^  ^  I  *  ^ 

1  *  £  r>  X  *  (£  -  r  a  •  n  o  ),  8  ■  x  2X  •  We  wiH  show 

+  .  A 
that  a  and  a  are  diagonal,  which  implies  x  diagonal,  and  thus 

9,  ■  a  ,  which  implies  £  ■  a  .  a  wil  have  real  positive 

diagonal  terms,  and  p  will  be  the  largest  with  o+  *  p.  Thus 

— oo 

eT(k)  -  ETti  -  8]ak  -  et  u  U  -  a*H  i.(a*)k  k 

*  £T  t(i  -  a*Ha*)k  i  •  a  ♦  c)'N/2  fj  a  -  a*Ha*)k  i 

where  fQ  is  the  vector  (1,0,... 0) .  (This  follows  because 
£  »  L(0)(l+c)  and  i  *  ]*  1,  so  ^J(O)  i  *  q  • )  Continuing, 

we  obtain  eT(k)  »  (1  +  e)"N/,2(l  -  p)pk  f£  L  * 

(1+e)  N^2  (1-p) pk  ^T(0)  »  £T(l-p)pk,  as  claimed. 

2 

Now  consider  the  matrix  quadratic  equation  a  *  ot  +  ^  o  . 
(We  drop  the  "+'•  for  convenience.) 

We  know  that  the  equation  has  a  unique  strictly  sub¬ 
stochastic  solution,  which  we  want,  and  at  least  one  other 
solution  in  S^.  The  meaningful  one  is  the  limit  of  the 
recursion  £  +  1  £2  ,  o  3  0.  Conjugating  the  equation 
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by  L)  we  obtain  a  ■  a  +  ^  s  ,  where  a  »  ,  £  •  D  r. 

Notice  that  if  we  start  the  transformed  recursion  with 

A  A 

Oq  »  J),  then  is  diagonal  for  each  k.  For  the  moment,  let 
us  assume  that  the  fixed  point  a  which  corresponds  to  the 
desired  a  ,  i.e.  the  £  s.t.  ^  £  i  is  the  correct  £  ,  is 
diagonal.  Then,  we  are  left  with  the  individual  equations 


°ii  ■  "Dti  *  7  ’  1  '  °-1 . N- 

These  equations  are  scalar  quadratic  equations  of  the  form 
z  ■  a  +  b 

where  a,  b  >  0  and  a  <  b  (since  n  <  r) .  Further  a  ♦  b  *  1 
if  i  ■  0,  a  ♦  b  <  1,  i  i*  0.  From  the  quadratic  formula,  we 
obtain  the  roots 


1  t  ✓!  -  4ab 


Note  ab  <  implies  the  roots  are  real.  A  simple  geometric 
argument  shows  that  both  roots  are  positive,  one  root  is 
always  less  than  1,  and  the  other  equals  1  if  a  ♦  b  *  1, 
and  is  greater  than  1  if  a  +  b  <  1.  Since  we  are  after  a  strict¬ 
ly  sub- stochastic  matrix,  we  pick  the  root  less  than  1  for 
each  i.  This  is 


1  -  (1  -  4n  r  Di) 
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For  i  *  0,  this  is  p  ,  and  for  i  +  0  it  is  less  than  p . 

Technically,  we  still  need  to  show  that  with  this  choice 

A  A  + 

of  £,  the  matrix  L  a  L  is  the  desired  a-  .  Intuitively, 
this  is  pretty  clear.  A  formal  proof  can  be  obtained  by  showing 

A  A  A  A  A  2  A 

that  if  a  is  the  limit  of  u,  «  a  +  8  a  ,  a  ■  0,  then 

“  “Vi 

L  f  L  is  the  limit  of  the  original  recursion  for  o+,  starting 
with  Oq  *  £.  A  simple  argument  then  shows  that  the  o  we  have 

~  A  A  ^2 

chosen,  i.e.  with  eigenvalues  <1,  is  the  limit  of  a  »  a  +  S  a 

a  "  “t-1 

starting  with  a  *  0. 

Si  .ilar  computations  go  through  for  o'  except  now  we  have 

2 

the  scalar  equations  z  *  a  +  bz  with  a  >  b.  For  i  *  0  (where 
a  +  b  «  1)  we  obtain  a  root  at  z  *  1  and  a  root  z  >  1.  Thus 
we  pick  z  •  1.  For  i  ft  0,  one  root  is  <  1,  and  the  other 
is  >  1.  We  pick  the  smaller. 

The  eigenvalues  of  o+other  than  p  have  no  apparent 
probabilistic  meaning. 
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VI.  Queue  Statistics  and  Numerical  Examples 
A.  Queue  Statistics 

0 

The  matrix  series  Z  g  obeys  the  "ordinary  calculus  rules" 

^  ®  o  _  2 

for  geometric  series,  e.g.  £9  ■  -  £]  .  Hence, 

computing  moments  of  the  distribution  of  the  number  in  system 

poses  no  special  problem.  Also,  one  can  easily  obtain  an 

expression  for  the  actual  "bit"  backlog  distribution.  Specifi- 

T 

cally,  let  f ^  »  e  (k)  1  denote  the  marginal  probability  of 

*k 

k  messages  in  the  system,  and  let  [exp (5)]  denote  the 

k-fold  covolution  of  the  message  length  distribution  exp  (5) 

00 

with  itself.  Then  the  backlog  distribution  is  Z  fv[exp  (£)] 

k-0  K 

It  is  customary  to  separate  a  queuing  system  into  the 
"queue"  and  the  "service  facility" 


The  total  time  spent  in  the  system  is  denoted  by  T,  the  time 
a  message  spends  in  the  queue  is  called  the  waiting  time  W, 
and  S  is  the  service  time.  It  follows  from  this  definition 
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that 

*  K5  -  1  +  Pr[KS  *  0] 

where  KS  ■  numbeT  in  system,  KQ  -  number  in  queue,  and  over  - 
bar  denotes  expectation. 

For  a  constant  rate  server  (and  i.i.d.  message  lengths) 
this  division  is  quite  natural.  A  message's  service  time  is 
linearly  related  to  its  length  and  is  independent  of  its  waiting 
time  and  the  service  and  waiting  times  of  other  messages.  That 
is,  the  queuing  process  only  affects  waiting  times.  For  a 
voice/data  queue  model  in  which  r^  varies  with  i,  the  distinction 
between  service  and  waiting  can  be  less  meaningful  as  the  follow¬ 
ing  examples  indicate. 

Consider  a  service  rate  vector  (r0,...,r^^  s.t.  *  0 
V  i  j*  0 1  and  rQ  is  "infinite".  Assume  A  •  y  ■  1.  The  mean  first 
passage  from  A  *  1  to  A  »  0  is  1_po  __  2N  .  More  generally, 

~  -  A 


the  passage  time  from  any  initial  state  to  state  0  is  dominated 

by  the  time  to  take  the  last  step  from  1  to  0,  i.e.  if  a 

message  arrives  when  A  t  0,  the  mean  time  until  the  speaker 

2N 

process  reaches  0  is  still  9  .  Now  suppose  that  the  mean 

message  length  S*1  *  1.  There  are  two  cases.  If  the  mean 

1  2N 

mterarrival  time  —  <<  j large  queues  accumulate  while  A  t  0, 

and  these  are  emptied  "instantaneously"  when  A  reaches  0. 

Thus  most  messages  have  a  very  large  waiting  time  but  essentially 

1  2*^ 

no  serice  time.  If  -  >>  —  ,  then  most  messages  arrive  to 

1  N 

an  empty  system.  Their  waiting  times  are  zero,  but  their 


•J 


«•»— 1 <Sf 


service  times  are  very  large.  In  both  cases,  the  mean  system 
time  is  determined  by  the  time  it  takes  for  a  return  to  A  *  0. 

The  point  of  this  is  as  follows.  A  message  is  "inconvenienced" 
by  having  to  share  the  link  with  both  the  speakers  and  other 
messages.  To  a  certain  extent,  the  "service  time"  reflects 
the  sharing  with  the  speakers,  and  the  waiting  time  reflects 
the  sharing  with  otheT  messages.  Although  it  is  possible  to 
compute  the  service  time  distribution  given  a  start  in  some 
speaker  state,  the  interaction  between  the  queueing  process 
and  voice  activity  process  can  affect  the  distribution  of 
service  initiation  states.  That  is,  waiting  and  service  times 
are  not  independent,  and  the  distribution  of  service  initiation 
states  is  generally  not  the  same  as  the  speaker  ergodic 
distribution  £. 

In  any  case,  it  is  possible  to  derive  expressions  for 
the  Laplace-Stielt jes  transforms  of  the  limiting  distributions 

/N 

of  T,  W,  and  S.  Let  X^(z)  denote  the  L  -  S  transform  of  the 
joint  [service  time;  voice  completion  state  *  j/A(0)  *  i] 
distribution,  i.e. 

XfjCz)  *  j  e  zt  d^Pr ^service  time  <  t,  A  *  j  at 
completion) A(0)  *  i] 

(See  Chapter  IV.) 

Because  message  lengths  are  i.i.d.,  it  follows  that  the 
analogous  transform  for  the  time  to  service  k  successive 
messages  is  [£(z)l  .  Further,  [X(0))^j  *  Pr[completion  of 

k  messages  occurs  when  A  *  j|A(0)  *  i] .  We  also  make  use  of 


H  > 
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the  following  observations: 

•  Poisson  arrivals  take  a  "random"  look  at  the  system, 
so  that  the  equilibrium  distribution  of  [A,  K]  as 
seen  by  arriving  messages  is  the  same  as  the  ergodic 
distribution  {e^)  •  (This  need  not  be  the  case  for  non- 
Poisson  amivals;  see  Kleinrock  [24]  .) 

«*  If  a  message  arrives  to  a  nonempty  system  and  A  *  j  , 
the  distribution  of  the  time  needed  to  complete  service 
for  the  message  in  the  service  facility  is  the  same  as 
if  a  message  with  length  —  exp  CO  starts  service  when 
A  «  j  .  (This  follows  from  the  memoryless  property  of 
the  exponential  distribution.) 

From  all  this,  it  follows  that 

(2)  «  Z  eT(k)  [X(z)  ]k+*  i  ,  £T[i  -  e][I  -  0  XU)]'1  X(z)  1 

k-0  -  "  ~ 

QO 

W(z)  -  Z  eT(k)  [X(z)]k  1 
k*o 

S(z)  -  Z  eT(k)  [£(0) ] k  X(z)  1 
k*0  ~ 

A 

The  matrix  X(0)  can  be  related  to  a  and  3  ,  as  follows. 

If  K(0)  _>  1,  the  time  to  complete  a  service  is  the  time  until 
K(t)  registers  its  first  decrease.  By  this  we  do  not  mean  the 
first  passage  time  to  K(0)  -  1,  but  the  time  until  the  first 
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decrease  occurs,  with  any  possible  number  of  intervening 
increases.  Since  K  is  not  bounded  above,  it  follows  that 

X  -  £  +  a  X 

(The  reasoning  is  similar  to  that  used  to  derive  the 
quadratic  equation  for  the  a’s.)  Thus  X  *  [^1  -  a]'1  £. 

A  straightforward  calculation  will  show  that  [£  -  a]'1  £  * 

(5  Id  -  £)  *  Tjj,  which  is  the  expression  derived  in  Chapter  IV. 

B.  Numerical  Examples 

We  have  applied  the  matrix-geometric  algorithms  to  a 
voice/data  queue  with  10  speakers.  The  other  speaker  parameters 
are  X  *  .75,  y  ■  .81, (per  second)  so  mean  silence  *  A'1  *  1.34sec. 
and  mean  talkspurt  *  y"1  *  1.23  sec.  (  6].  The  link  has  capa¬ 
city  320  Kbps,  and  each  speaker  demands  32  Kbps  when  in  talk- 
spurt.  Thus  r^  *  320  -  32i  Kbps,  so  r  s:  166  Kbps. 

One  important  question  is  how  the  mix  of  data  traffic 
affects  performance,  i.e.  for  fixed  total  average  data  rate, 
n/S,  how  does  performance  depend  on  n  and  £  individually.  We 
consider  three  mixes  for  each  value  of  the  utilization 

o  *  . 1,.2, . 9,  where  0  *  n/(£r)  .  The  three  mixes  are 

referred  to  as  cases  A,  B,  and  C,and  the  respective  mean  message 
lengths  are  500,  1000,  and  2000  bits.  Thus,  in  each  case, 
n  is  varied  to  obtain  the  appropriate  value  of  0. 

Notice  that  for  states  5  -  10,  r^  <_  r.  The  total 
occupation  fraction  of  these  states  is  about  .57.  We  have 


p 
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also  computed  the  mean  first  passage  times  from  each  speaker 


state  to 

state  M  *  I"4*  sl 

*  5.  These 

are 

State 

Time(sec) 

State 

Time  (sec) 

0 

1.23 

6 

.36 

1 

1.1 

7 

.61 

2 

.93 

8 

.80 

3 

.72 

9 

.95 

4 

.43 

10 

1.08 

The  results  are  organized  in  tabular  form,  and  the  notation 
is  as  follows: 


US  »  mean  number  in  system 
VARKS  -  variance  of  KS 
STDVKS  *  yms 
COEFVAR  »  STDVKS/X3 

*  mean  number  in  queue 
T  *  mean  system  time 
W  *  mean  waiting  time  . 


S£ftV  *  mean  service  time 

Pr [0]  -  Pr [KS  -  0] 

-  DRIFT  -  -TTT(a  -  £)1  «  £!■■"  Q 
(see  Chapt.V)  +  n 

MAXDEC  Maximum  asymptotic 
decay  rate,  i.e. 

PRF  root  of  <j+. 


Our  "benchmark"  for  comparison  is  an  M/M/1  queue  with  service 
rate  r  *  Eoi  r^,  i.e.  r^»r  V  i.  This  is  appropriate  since  we 
wish  to  determine  how  the  service  rate  variability  affects 
performance.  Each  of  the  first  nine  tables  represents 
the  above  statistics  for  all  cases  (one  table  for  each 
value  of  p) .  Each  of  the  next  two  tables  presents  selected 


statistics  for  all  values  of  p  and  all  mixes.  In  the  second 
of  these  two  tables,  the  quantity  SERV  (random) refers  to 

SErV^  where  SERv^  is  the  mean  effective  service  time  given 
a  start  in  state  i,  and  is  the  ergodic  probability  that 
A  ■  i.  This  is  the  average  service  time  that  a  message  arriving 
"randomly"  to  an  empty  system  would  incur.  As  indicated,  this 
is  not  necessarily  the  actual  mean  service  time  incurred  by 
messages  since  the  queuing  process  does  interact  with  the  voice 
activity  process.  For  purposes  of  comparison,  we  mention  that 
SERV (random)  is  3.5  ms,  6.3  ms  and  13.6  ms  in  cases  A,  B,  and  C 
respectively.  Finally,  the  last  three  tables  present  some 
points  from  the  tails  of  the  respective  distributions,  e.g. 

TAIL  5  «  Pr [KS  >  5] . 

.  2 
In  the  case  of  o  ,  the  iteration  for  a.  *  8  ♦  a  a 

"  '"*•  ~i- 1 

-4 

was  run  until  was  stochastic  to  within  10  or  so.  In 

the  case  of  a +  (for  which  we  have  no  absolute  test),  the 

-  7  -  8 

procedure  was  run  until  max  | (o  )  -  (a  )  |  £  10  ,10 

i,j  ~£  ij  ~£-l  ij 

We  did  not  perform  an  error  propagation  analysis,  but  this 

stopping  criterion  seemed  reasonably  adequate.  Occasionally, 

- 12 

we  run  some  cases  until  the  maximum  difference  was  10  or  so. 
This  did  not  result  in  any  drastic  changes. 

v 

For  an  M/M/1  queue,  the  PRF  root  is  p  and  f^  *  (l-p)p  , 
i.e.  the  distribution  of  KS  is  independent  of  the  load  mix. 

Thus  for  fixed  o,  the  average  system  time  approaches  zero  as 

0 (~)  as  n,  S  *  ».  For  the  voice/data  queue,  the  distribution 


of  KS  is  not  independent  of  the  load  mix,  and  we  can  qualitati¬ 
vely  explain  the  observed  behavior  as  follows.  (In  this  discussion 
we  consider  p  fixed,  i.e.  we  let  q,^  approach  either  0  or  « 
with  n/C  remaining  fixed.)  As  indicated  in  Chapter  IV,  the 
behavior  of  the  voice/data  queue  approaches  that  of  the  analogous 
M/M/1  queue  as  q ,  5  -*-0.  This  is  because  the  speaker  process 
moves  "infinitely  fast"  relative  to  the  data  arrival  and  service 
processes  so  that  each  message  is  effectively  served  at  a  constant 
rate  r.  As  q ,  5  -*■  »  ,  the  individual  queuing  processes  in  each 
speaker  state  become  "decoupled",  and  there  are  two  possible 
types  of  behavior.  If  p.  «  n/(£  r.)<  1  V  i,  then  the  behavior 
approaches  that  of  N  +  1  decoupled,  stable,  M/M/1  queues  with 

_  Pi 

utilizations  p^.  For  example,  KS  will  approach 
Note,  by  convexity, 


■1  SPi  nCz^Pi  ri) 


Pi  1-  p -  i  1  -  Z 


Pi 


P,-  - 


1  *  nC 2  5P.  r.) 


-  -  , — 2 — 
■  1  1  "P 


U  1" 

That  is,  the  variable  service  rate  does  degrade  performance, 
though  KS  remains  bounded  as  q  -►  «  .  If  there  is  at  least 
one  overloaded  or  unstable  state,  i.e.  some  i  s.t.  p^  >  1,  the 
behavior  is  qualitatively  different.  While  the  voice  process 
is  sitting  in  state  i,  the  mean  number  in  system  is  growing 
as  (q  -  £r^)t.  As  long  as  overall  stability,  i.e.q<  5  r  , 

is  present,  these  backlogs  are  eventually  emptied.  However, 
as  q ,  5  00,  the  contribution  to  KS  made  by  the  unstable 


states  becomes  the  dominant  contribution,  and,  asymptotically, 
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KS  will  grow  linearly  with  n.  The  value  of  the  slope  of  this 
growth  depends  on  many  factors,  and  there  does  not  appear  to 
be  a  simple  estimate.  Thus,  we  have  the  following  qualitative 
picture  of  the  behavior  of  K5. 


(No  assertions  about  monotonicity  or  convexity  are  intended 
in  these  graphs  , though  one  would  certainly  expect  at  least 
monotonicity. ) 

For  the  M/M/1  and  voice/data  with  oi  <  1,  V  i,  KS  remains 
bounded,  and  it  follows  that  as  n,S  ®,  the  average  system 
time  goes  to  0.  For  the  third  case,  the  average  number  in 
system  grows  linearly  with  n  ,  but  since  the  average  message 
length  is  scaled  accordingly,  it  follows  that  the  average 
backlog  in  bits  and  hence  the  system  time,  remain  bounded. 

In  fact,  if  1J5  grows  as  bn  it  follows  from  Little's  Theorem 
that  T  -*•  b.  Thus,  the  existence  of  an  unstable  state  does  not 
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preclude  overall  stability,  but  it  does  lead  to  the  somewhat 
"unconventional"  result  that  even  if  individual  messages  become 
small,  their  delay  is  bounded  away  from  zero,  for  fixed  total 
data  rate. 

As  n,  £  ■*  00 »  the  arriving  data  stream  effectively  becomes 
a  steady  flow  of  rate  n/£  bps.  This  is  because  during  an 
interval  of  length  t  ,  the  mean  number  of  arriving  bits  is  nt/£ 
but  the  variance  is  2nt/£2,  and  this  goes  to  zero  as  n,£  ■+■  ®, 

n/£  fixed.  This  suggests..-  using  a  "flow  model"  for  the  back- 

♦ 

log  when  n  and  £  are  large.  That  is,  one  assumes  that  the  back 
log  X(t)  grows  (or  shrinks)  deterministically  at  rate  n/£  -  ri 
when  A  *  i.  This  model  should  give  the  correct  asymptotic 
dependence  of  the  backlog  on  n,  i.e.  if  IS  -►  bn  ,  the  flow 
model  will  show  that  the  average  backlog  J  is  (n/£)b.For 
the  flow  model,  the  vector  process  (A(t) ,  X(t)]  is  a  Markov 
process,  and  its  ergodic  probability  distribution  vector 
G(x) ,  where  Gi(x)  *  Pr[A  *  i,  X  <_  xj,  satisfies  the  system  of 
differential  equations 

3!  GT(x)C(n/£)I  -  rD)  -  GT(x)  £ 

subject  to  the  boundary  conditions 

•  G^(0)  =  0  if  n/.£  >  r^,  i.e.  the  backlog  cannot  be  empty 

in  unstable  states  } 

.  G^(®)  *  p' ^  j  i.e.  the  marginal  distribution  of  A  is 
the  original  ergodic  distribution  £  . 

(Th^s  model  was  used  by  Berger  in  [6] ,  and  the  reader 
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can  find  a  detailed  exposition  there.) 

It  is  sometimes  easier  to  work  with  these  equations  directly 
rather  than  with  limiting  arguments  on  the  previous  expression 

oo 

I  *k 

f. [exp(5)3  •  For  example,  it  follows  from  the  structure 

k-0  K 

of  the  differential  equation  and  the  fact  that  the  boundary 
conditions  depend  only  on  X/y,  that  the  backlog  random  variable 
is  linear  in  1/X  ,  1/u  for  X/y  fixed,  i.e.  G(x,sX,  sy)  » 
G(sx,X,y).  This  implies  that  the  average  system  time  is  linear 
in  1/X,  1/y ,  in  the  limit  of  small,  frequent  messages. 


Voice/Data  M/M/1  Voice/Data  M/M/1  Voice/Data  M/M/1 
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KS 

STDVKS 

COEFVAR 


A 

B 

C 

Voice/Data 

M/M/1 

Voice/Data 

M/M/l 

Voice/Data 

M/M/l 

p=.l 

.138 

.11 

'  .135 

.  11 

.132 

.11 

.475 

.351 

.426 

.351 

.406 

.351 

3.4 

3.2 

3.2 

3.2 

3.1 

3.2 

.2 

.348 

.25 

.33 

.25 

.33 

.25 

1.07 

.56 

.84 

.56 

.74 

.56 

3.1 

2.24 

2.5 

2.24 

2.3 

2.24 

.3 

.7 

.43 

.64 

.43 

.59 

.43 

2.3 

.78 

1.6 

.78 

1.2 

.78 

3.3 

1.8 

2.5 

1.8 

2.03 

1.8 

.4 

1.47 

.67 

1.21 

.67 

1.05 

.67 

4.87 

1.0S 

3.04 

1.05 

2.14 

1.05 

3.3 

1.57 

2.S 

1.57 

2.0 

1.57 

.5 

3.14 

1 

2.31 

1 

1.86 

1 

9.8 

1.4 

5.7 

1.4 

3.72 

1.4 

3.12 

1.4 

2.5 

1.4 

2 

1.4 

.6 

6.97 

l.S 

4.68 

l.S 

3.42 

1.5 

19.2 

1.94 

10.8 

1.94 

6.54 

1.94 

2.7S 

1.29 

2.31 

1.29 

1.91 

1.29 

.7 

16.3 

2.33 

9.8 

2.33 

6.7 

2.33 

37.3 

2.79 

20 

2.79 

11.8 

2.79 

2.29 

1.2 

2.04 

1.2 

1.76 

1.2 

.8 

41.4 

4 

23.7 

4 

14.7 

4 

75.9 

4.47 

40.4 

4.47 

22.7 

4.47 

1.83 

1.11 

1.7 

1.11 

1.55 

1.11 

.9 

134 

9 

'  73 

9 

42.2 

9 

189 

9.5 

99 

9.5 

54.4 

9.5 

1.41 

1.06 

1.36 

1.06 

1.28 

1.06 

Selected  Queue  Length  Statistics  for 
All  Values  of  p  and  All  Cases 
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A 


C 


p 

Voice/Data 

M/M/ 1 

Voice/Data 

M/M/ 1 

Voice/Data 

M/M/1 

HHHH 

3.3 

8.1 

6.6 

15.9 

13.2 

w 

.7 

.  3 

1.3 

.6 

2.3 

1.2 

Actual  SERV 

3.4 

3.0 

6.8 

6.0 

13.6 

12.0 

T/SERV  (random) 

1.17 

1.1 

1.29 

1.1 

1.17 

1.1 

.2 

5.2 

3.7 

9.9 

7.4 

19 

14.8 

1.8 

.7 

3.2 

1.4 

5.6 

2.8 

3.4 

3.0 

6.3 

6.0 

13.5 

12.0 

1.49 

1.23 

1.57 

1.23 

1.4 

1.23 

.3 

7.25 

4.3 

13  - 

8.6 

24 

17.2 

3.9 

1.3 

6.1 

2.6 

10 

5.2 

3.4 

3.0 

6.8 

6.0 

13 

12.0 

2.  OS 

1.43 

2.06 

1.43 

1.76 

1.43 

.4 

11.1 

5.0 

18.2 

10 

32 

20 

7.7 

2.0 

11.6 

4 

18.4 

8 

3.4 

3.0 

6.6 

6 

13.6 

12 

3.17 

1.66 

2.8 

1.66 

2.35 

1.66 

.S 

18.9 

6.0 

27.7 

12 

45 

24 

15.6 

3.00 

21 

6 

31 

12 

3.3 

3.00 

6.7 

6 

14 

12 

S.4 

2 

4,4 

2 

3.31 

2 

.6 

35 

7.5 

4.7 

15 

68 

30 

31.7 

4.5 

40 

9 

55 

18 

3.3 

3.0 

7 

6 

13 

12 

10 

2.5 

7.46 

2.5 

5 

2.5 

.7 

70 

10 

84 

20 

115 

40 

66.3 

7 

78 

14 

102 

28 

3.2 

3 

6 

6 

13 

12 

20 

3.3 

13 

3.3 

8.46 

3.2 

.3 

156 

15 

179 

30 

221 

60 

152.8 

12 

172.7 

24 

208 

48 

3.2 

3 

6.3 

6 

13 

12 

45 

5 

28.4 

5 

16 

45 

.9 

449 

30 

489 

60 

564 

120 

446 

27 

483 

54 

552 

108 

3 

3 

6 

6 

12 

12 

128.3 

1-0 

77.6 

10 

41.5 

10 

Selected  Time  Statistics  for  All  Values 
of  p  and  All  Cases  (in  milliseconds) 
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TAIL  10 


A 

B 

C 

M/M/1 

.1 

.0002 

.000051 

.0000065 

1  x  10-10 

.2 

.0016 

.0008 

.00028 

1  x  10*7 

.3 

.0078 

.0047 

.0024 

5.8  x  10*' 

.4 

.025 

.018 

.011 

.0001 

.5 

.067 

.052 

.037 

.0098 

.6 

.14 

.12 

.094 

.006 

.7  • 

.27 

.24 

.20 

.028 

.8 

.46 

.42 

04 

00 

.11 

.9 

.7 

.67 

.64 

.35 
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A 

B 

C 

M/M/1 

p  •  .1 

7.1  x  10‘5 

7.8  x  10'6 

2.8  x  10'7 

1  x  10_1S 

.2 

.00011 

.00026 

4.4  x  10'S 

1.3  x  10’10 

.3 

.0043 

.0019 

.00061 

1.4  x  10'8 

.4 

.015 

.009 

.0041 

1.1  X  10‘6 

.5 

.046 

.031 

.017 

3.1  x  10"S 

.6 

.11 

.082 

.054 

.00047 

.7 

.22 

.18 

.14 

.0047 

.8 

.40 

.36 

.30 

.035 

.9 

.66 

.62 

.57 

.20 
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C.  A  Queuing  Inequality 

From  the  previous  discussion,  it  appears  that  the  M/M/1 
queue,  i.e.  r  »  r  _1,  is  the  "best”  queue  in  the  sense  of  minimizing 
IS  or  T  among  service  rate  vectors  with  mean  7.  This  is  consistent 
with  the  queuing  theory  "metaprinciple"  that,  for  fixed  average 
values,  "performance"  degrades  as  "randomness"  reases.  One 
can  attempt  to  make  this  notion  precise  in  several  ways,  which 
we  now  explore.  First  some  notation  and  "context". 


For  a  random  variable  X,  let  F„  denote  its  distribution 

x  *  'o 


function  and  Fx  *  1  -  Fx-  Note  that  E(X)  ■/’  F*(s)ds '  /. 


Fx(s)ds. 


(Use  integration  by  parts.)  We  assume  that  each  term  in  the 
difference  is  finite,  i.e.  E(|x|)  <  ®  •  For  two  random  variables, 
X,  Y,  we  write  X  <.Y,  if  F^(s)  <  F^(s)  for  all  s.  This  notion 
of  "inequality"  is  sometimes  called  stochastic  dominance  and 
is  quite  strong.  For  example,  if  X  and  Y  are  nonnegative,  then 
X  <  Y  implies  E(Xn)  <  E(Yn) ,  n  -  1,2,...  since 

+  CO 


E(Xn)  -  n  f  sn 
Jo 


-1  c 

Fx(s)ds  for  a  nonnegative  random  variable  X. 


^Yi£  L 


We  write  X  <,  Y  if  /  (F?(s)  -  F5(s))ds<  O,for  all  t. 

This  is  equivalent  to  E((X-t)+)  <_  E((Y-t)+)  for  all  t,  where 
(x) +  ■  x,  x  ^  0;  (x)+  ■  0,  x  <  0.  Although  not  as  strong  as 
<^ ,  the  notion  <_^  does  retain  some  features  of  a  measure  of 
the"smallness"  and  "determinism"  of  a  random  variable,  as  the 
following  properties  indicate. 
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1)  If  X  <2  Y  then  E(X)  <  E(Y) . 

Proof:  By  previous  remarks,  it  suffices  to  show 

(F^(s)  -  F®(s))ds<  0.  Set  G(t)  -  j  (f£(s)-  F^(s))ds 

X  <2  I  implies  G(t)  <_  0  for  all  t.  Further,  the .  assuir.pt  ions  E(|x|)<®» 
E(|Y|)  <  <* rule  out  the  possibility  that  G(t)  "oscillates" 
as  t  ♦  n  ,  i.e.  these  assumptions  imply 
so  that  G(t)  must  converge  as  t  +  -*  .|| 


L 


FyC*)|ds  < 


l)  If  E(x)  •  E(Y)  and  X  <2  Y,  then  var  (X)  <var(y). 

Proof:  It  suffices  to  show  E(X2)  <_  E(Y2).  This  follows  from 

the  hypothesis  X  <_2  Y  and  the  identity  E(X2)  -  E(Y2]  » 

2  f  dt  if  (f£(s)  -  Fy  (s))ds] .  || 

J  -op  t 

As  a  "partial  converse"  to  2)  we  have 


3)  If  X  is  deterministic  and  E(X)  <_  E(Y)  then  X  <2  Y. 


Proof:  If  t  >  X,then  E((X-t)+J  ■  0  <_  E((Y-t)+).  For  t  <  X 

we  reason  as  follows.  E(X)  <_  E(Y)  implies  /  (Fc(s)  - 

ft  J  t 

Fy(s))ds  <_  -  J  (f£(s)  -  F£(s))ds.  Now  for  any  s  t, 

F^(s)  *  1  >_  Fy(s)  since  t  ^  X.  .Thus  the  right-side  of  the 
last  inequality  is  £  0.  | | 


Stoyon  [25]  has  used  <_2  to  make  the  "metaprinciple" 
precise  in  the  following  way.  Let  and  Bi,  i  ■  1,2,... 
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denote  the  interarrival  and  service  times  respectively  for  two 
G/G/l  qfleuing  systems.  Then:  if  E(A^)  *  E(A2)>  A1  <_2  ^2  ’ 

B1  -1  B2*  then  wi  I2  W2*  Wi  *  waiting  time.  In 

particular,  for  fixed  arrival  time  distribution  A  and  mean  service 
time  $,  the  A/D/1  is  "best"  among  all  A/S/l.  (Apply  property  (3).) 
Similarly  D/B/l  is  "best"  among  fi/B/1  for  B  fixed  in  distribution 
but  A  fixed  only  in  mean.  (D  *  deterministic.) 

For  the  voice/data  queue,  we  have  been  able  to  establish 
the  following  inequality. 

Proposition:  Let  B(t,  r)  denote  the  backlog  (in  bits)  at  time 
t  for  a  voice/data  queue  with  service  rate  vector  r  and  with 
an  initial  speaker  state  drawn  from  the  stationary  distribution 
£,  and  an  initially  empty  backlog.  Then 

B(t,  r  1)  <2  B(t,  r) 

—  T 

where  r  ■  £  £,  and  all  other  parameters,  i.e.  X,  y,  n,  5, are 
the  same  in  the  two  cases. 

Our  proof  uses  the  following  characterization  of  the 
backlog. 

Lemma :  Let  U(t)  denote  the  total  number  of  bits  arriving  up  to 
time  t.  Let  R(t  r)  denote  the  total  potential  amount  of 

f* 

service  up  to  time  t,  i.e.  R(t,  r)  *  I  r(A(s))ds, 
r(A»i)  ■  r^.  Then  0 

B(t,r)  -  sup  (U(t)  -  U(y)  -  R(t,r)  ♦  R(y,R)). 

O^y^t 
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Proof : 

B(t,r)  =  U(t)  -  R(t,r)  ♦  /  I [B(s , r)  »  0]r(A(s))ds, 

Jo 

where  I  (•)  is  the  indicator  function.  This  integral  term 
gives  the  amount  of  service  that  is  counted  in  R(t,r)  but  which 
should  not  be  counted  because  work  is  not  done  during  idles. 

That  is,  the  current  backlog  *  total  arrival  -  total  potential 
work  ♦  work  counted  during  idles.  The  following  picture  "shows" 
that  the  integral  is  given  by 


sup  (R(y,r)  -  U(y) ) . 
0^y<t 


RC^rl-UU) 


It  is  clear  that  a  new  idle  starts  whenever  the  graph  crosses 
its  previous  maximum.  Thus  the  sup  counts  that  part  of  R(t,r) 
which  accumulates  in  idles. |j 

Proof  of  Proposition; 

Set  G(t,y,r)  ■  U(t)  -  U(y)  -  R(t,r)  ♦  R(y,r).  Since 
(a-b)+  *  sup(a,b)  -  b  foT  any  numbers  a,b,  the  proposition  is 
equivalent  to 
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E  sup (x, sup  G(t,y,r  )  £  E  sup(x,  sup  G(t,y,rJ)  for  all  x. 
0<y<t  0_<y<_t 

Since  E  sup(ay)  >_  supCECa^))  for  any  collection  of  random 

variables  {a  }  ,  since  sup(x,sup(a  ))  *  sup(sup(x,a  ) ) ,  and 

Y  y  Y  y  Y 

since  U(t)  and  A(t)  are  independent  we  obtain 

p  sup (x, sup  G(t,y,r))  >  E.,  sup  (E,  sup(x,  G(t,y,r))) 

CA,U  0<y<t  u  0<y<t  A 

>  E.j  sup  (sup(x,E.  G(t  ,y ,r) ) )  -  E.,  sup  (sup(x,G(t,y,r  1)) 

°lylt  °<y<t 

The  last  equality  uses,  E(R(s,r))  *  r  s,  which  holds  because 

the  initial  speaker  state  A(0)  is  dravm  from  the  stationary 

distribution. 

Remark:  Our  proof  did  not  rely  on  U(t)  consisting  of  Poisson 
arrivals  with  exponential  length  messages,  and  it  should  work 
for  any  "nonpathological"  bit  arrival  process, as  long  as  arrivals 
and  speaker  activity  are  independent.  Similarly,  one  should  also 
be  able  to  extend  it  to  more  general  stationary  service  processes. 
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VII.  A  CONTROL  PROBLEM 

So  far,  we  have  discussed  methods  for  analyzing  the 

data  queue  performance  as  a  function  of  given  service  rates 

that  depend  only  on  the  speaker  activity  A.  From  the  discussion 

of  the  TDM  architecture,  it  is  evident  that  there  is  no  particular 

"physical"  reason  to  change  the  allocation  only  when  speaker 

activity  changes.  That  is,  for  purposes  of  transmission, 

voice  and  data  blocks  are  indistinguishable,  and  one  can  easily 

vary  the  allocation  from  frame  to  frame.  In  an  idealized  model 

then, one  night  assume  that  the  allocation  can  be  varied 

"instantaneously"  (i.e.  neglect  the  frame  structure)  and  seek 

an  optimal  control  (allocation  policy)  with  respect  to  some 

overall  cost  for  voice  and  data  performance.  For  the  data 

component  of  this  cost,  one  can  take  some  function  of  the  delay 

or  backlog.  For  voice,  one  might  consider  two  types  of  costs. 

First,  one  might  assume  that  speech  is  kept  in  some  finite 

buffer  with  overflow  speech  being  discarded,  and  then  take  the 

voice  cost  as  some  function  of  the  delay/loss.  Alternatively, 

one  can  assume  that  speech  is  not  buffered  but  that  there  is 

simply  a  "fidelity  cost  per  unit  time"  h(i,r)  of  encoding  the 

output  of  i  active  speakers  with  (C  -  r)  bps.  (C  ■  link 

capacity.)  The  average  cost  limE(y  I  h(A(t),  r(t))ut)  can 

T  -h»  Jo 

then  be  taken  as  the  voice  component  of  the  total  cost.  We 
adopt  this  structure. 


1 1  ih~ 
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I£  r  depends  only  on  A,  then  the  voice  cost  equals  its 

N 

ensemble  average,  K(r)  =  V  p.  h(j,r.),  since  the  speaker 

j  =0  J  J 

chain  is  ergodic.  If  the  data  cost  can  also  be  expressed  as 
some  "closed  form"  function  7(r0 , . . . ,r^) ,  then  the  optimization 

problem  "reduces"  to  a  static  problem,  e.g.  min  7(r  , . . .  rM)  + 

r  r  0  » 

ro’  - • ,rM 

+  K(ro,...rN)  or  min  7(rQ,...,rN)  subject  to  E(rQ , . . . , rN)^  hQ 

etc.  Such  a  "simple  form"  for  the  data  cost  does  not  appear  to 
be  forthcoming.  However,  by  using  the  method  of  chapter  V,  we 
can,  in  principle,  "optimize  by  numerical  trial  and  error",  i.e. 

7  can  be  evaluated  numerically  for  any  rQ,..,rN  (in  the  M/M 
case  anyway)  and,  presumably,  K  can  be  evaluated  since  the 
Pj  are  known. 

If  the  data  service  rate,  r,  is  allowed  to  depend  on 
speaker  activity,  the  data  backlog  size,  and  perhaps  time 
explicitly,  the  problem  is  more  complicated,  and  we  have  no 
procedure  for  finding  the  equilibrium  behavior  of  the  data 
queue  for  a  fixed  policy.  (To  begin  with,  the  notion  of 
"stability"  is  more  complicated.  For  example,  one  can 
conceive  of  policies  that  keep  the  backlog  bounded,  but  that 
,r  do  lead  to  a  well-defined  "steady-state"  behavior.)  However, 
it  is  sometimes  possible  to  characterize  an  optimal  control, 
without  "reducing"  the  problem  to  the  static  case  by  finding 
a  formula  relating  cost  and  control.  As  a  first  step  in 
exploring  the  complete  stochastic  control  problem,  we  have 
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managed  to  characterize  the  optimal  control  for  the  much  simpler 
problem  of  emptying  an  initial  backlog,  assuming  no  data  arrivals 
or  speaker  activity  changes.  The  formulation  and  analysis  follow 


Backlog  Emptying  Problem 


•  X(t)  denotes  the  backlog  at  time  t,  X(0)  >  0 


•  r(t)  is  the  data  service  rate  so  that 


X(t)  -  (X(0) 


-A 


(S)ds)+ 


We  allow  r  to  be  piecewise  continuous  with  at  most 
a  finite  number  of  jump  discontinuities.  At  such  a 
jump,  we  choose  r  so  that  it  is  right  continuous. 
r(t)  is  constrained  to  be  in  [0,C]  for  each  t. 


•  The  voice  cost  per  unit  time  is  h(r) .  We  assume  h(0)  *  0, 
h(r)  nondecreasing  and  piecewise  differentiable  on 
either  [0,C)  or  [0,C1  i.e.  we  allow  h(r)  -*•  »  as  r-*-C. 

At  r  •  0,  we  take  the  right-sided  derivative  h'(0't’) 
and  at  r  *  C,  we  take  the  left-sided  derivative  h'(C") 
(which  might  be  infinite).  At  all  other  points 
0  <  r  <  C,  both  h'(r+)  and  h'(r")  exist,  and 
h'(r+)  »  h'(r  )  at  all  but  a  finite  number  of  points. 


•  The  total  cost  to  be  minimized  is 


J(r) 


X(t)dt 


f 

+Jo  h(r(t))dt 


Remark:  The  time,  T,  at  which  X  reaches  0  is  a  free  parameter 


I 
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in  the  problem,  and  clearly,  an  optimal  control  r  (t)  must  be 
zero  if  t  >  T.  A  priori,  T  might  be  infinite  for  some  h, 
i.e.  X(t)  approaches  zero  without  ever  reaching  it.  We  will 
show  that  if  there  is  an  optimal  conrrol,  the*. this  cannot  be 
the  case. 


Necessary  Condition  for  Optimality 

* 

Suppose  t1  and  are  two  times  s.t.  r  (t^)  >  0, 

* 

r  C t2 )  <  C.  By  right  continuity,  there  exist  <5,  A>0  such  that 
the  control 


r  (t)  -  6 
r(t)  *  r  (t)  +  5 


r  (t) 


t  e  [t1,t1+A] 
t  £  [ "t 2  >  t2+d] 
otherwise 


is  admissible. 


X(t) 


y  & 


To  first  order  in  A, 6,  the  difference  in  costs  J(r)  -  J(r  )  is 


(t2  -  tj_)5A  +  <5A 


h(r  Ctx>  -<$)  -  h(r  (t^) 


«.  r h(r* (t-+5)  -  h(rV-))' 

5A  - - - - - = - 


I 


0 
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★ 

If  r  is  optimal,  this  must  be  nonnegative,  so  we  obtain 


t2  +  h’(r  ) 


>  tx  ♦  h ' (r  ) 

r*r*(t2) 


r-r  (t,) 


(1) 


(We  have  shown  t^  <  t2  in  the  picture,  but  this  was  not  used 

in  deriving  (1),  i.e.  (1)  is  a  property  of  an  optimal  control  at 

any  times  t^,  t2  meeting  the  specified  conditions.  In  particular, 
*  # 

if  C>  r  (t^l.r  (t2)>  0,  (1)  also  holds  with  the  roles  of  tj^ 

and  t2  reversed.) 

We  can  now  show  that  there  is  a  time  T  at  which  X  does 

* 

reach  0  (assuming  an  optimal  r  exists).  First,  we  note  that, 

trivially,  there  is  a  control  with  finite  cost,  e.g. 

* 

r  *  rQ  1(0  t  <  X(0)/rQ)  where  I(  )  is  the  indicator  function 

and  0  <  rQ  <  C  (since  h(r)  <  •  if  r  <  C) .  Second,  we  note  that 

an  optimal  control  must  be  nonincreasing.  To  see  this,  observe 
* 

that  if  r  is  increasing  on  [a,b],  then  the  control 

r  «  r  (t)  t  t  [a,b] 

r  (b-(t-a))  t  e  [a,b] , 

* 

(which  just  reverses  r  in  time  over  fa,b])  has  the  same  voice 
cost  but  lower  data  cost. 
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Mow,  evidently,  there  is  some  time  t  at  which  r  assumes  a 

value  r0  less  than  C.  Since  h'(r+),  h'(r')  are  nonnegative  and 

+  * 
h'(r  )  <  «  if  r  <  C,  it  follows  from  (1)  that  r  (t)  can  be 

positive  only  if  t  <  t  +  h'(r*))  <  «.  Thus  there  is  some  time 
*  * 

^  at  which  r  (t^)  3  0.  By  monotonicity,  r  (t)  *  0,  for  t  >  t^, 

*  * 

so  if  X C t-j^)  j*  0,  J(r  )  is  infinite  and  r  cannot  be  optimal. 

Letting  T  -  inf  (t:X(t)  -  0}  it  follows  from  the  continuity 
* 

of  X(t)  (r  has  no  "impulse")  and  the  right  continuity  of 

r  that  X(T)  *  r*(T)  *  Of  x(t)  >  0,  r*(t)  >  0,  for  t  <  T. 

* 

Since  r  (t)  >  0,  t  <  T,  we  can  apply  (1)  to  obtain,  for 
0  <  t  <  T 


h'(r‘) 


<  t  +  h '  (0  ) 


(2) 


r-r  (T-t) 


t  ♦  h ' (r~) 


<  h'(r  ) 


r-r  (T  ') 


r  (T-t) 


(3) 


if  r  (T-t)  <  C. 


(In  (3),  if  r  (T  )  -0,  we  mean  lim.  h'(r")  which  must  be 

♦  T*°  -  + 
h’(0  ).)  It  follows  from  either  (2)  or  (3)  that  h'(r  )t  <^h'(0 


r-r  (T  ) 


Now  we  can  use  these  conditions  to  construct  r 


Case  1; 
r  >  0. 
h*(0| 


h  is  convex  U  .  In  this  case,  h’(r*)  _>  h’(0+)  for  all 
Thus,  by  the  previous  remark,  we  can  conclude 

*  h'(0+),  so  from  (2)  and  (3)  we  obtain 


r-r  *(T") 


h' (r‘) 


<  t  ♦  h'  (o  +  ) 


* 

r*r  (T-t) 


(4) 


h’(r+) 


>  t  ♦  h' (0+) 

r*r  (T-t) 

if  r* (T-t)  <  C. 


(S) 


The  basic  idea  is  to  work  "backwards"  in  time  from  T  and  use 

(4)  and  (5)  to  pick  off  avalue  for  r  (T-t) .  The  construction 

stops  when,  for  some  tQ,  r*(s)ds  -  X(0),  so  that  one  then 

Jl-tQ 

"redefines  the  origin"  and  T  «  tQ,  i.e.  the  free  parameter  T 
is  determined  by  this  condition.  We  illustrate  the  possibilities 
by  the  following  example. 


line*/ 

no V  d  i  ft aMt 
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In  the  region  0  <_  r  <  r^,  h  is  strictly  convex  and  differentiable 

so  (4)  and  (5)  hold  with  equality,  and  there  is  a  unique  solution, 

r  (T-t)  *  [h,]’1(t  +  h'(0+)).  For  t  in  the  region 

h’(r^)  -  h'(0+)  <_  t  <  h'(r^)  -  h'(0'*’),  any  r  <_  satisfies  (4) 

* 

but  only  r  *  r^  satisfies  (5).  Thus  r  (T-t)  sits  at  tj  until  t 

reaches  h'(r1+)  -  h’ (0+) ,  and  it  then  proceeds  up  the  next 

portion  of  the  curve.  Once  t  exceeds  h'(r2),  any  r2  £  r  <_  C 

* 

satisfies  (4),  but  if  r  <  C  ,  (5)  does  not  hold.  Thus  r  *  C 
must  be  chosen  and  then  maintained  until  the  condition  specifying 
T  is  met.  More  generally,  we  see  that  a  linear  portion  in  h 

causes  a  jump  discontinuity,  and  a  nondifferentiable  point  causes 

*  * 
r  to  remain  constant  for  some  time.  Otherwise  r  (T-t)  increases 

monotonically  and  continuously  with  t. 

Case  2:  h  is  not  convex.  This  reduces  to  the  previous  case 

^  /\ 
as  follows.  Let  h  be  the  convex  hull  of  h.  Then,  since  h  <  h, 

A  * 

an  optimal  policy  for  h  which  only  uses  values  for  r  at  which 
* 

h  and  h  agree,  must  also  be  optimal  for  h.  Now  in  an  interval 

^  «A 

(a,b)  in  which  h  and  h  disagree,  h  is  linear  and  "joins"  h  at 

the  endpoints.  By  the  previous  construction,  an  optimal  policy 
* 

for  h  will  only  use  a  or  b  if  it  uses  any  r  in  [a,b] . 
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VIII.  FURTHER  RESEARCH 

•  Control  Problem  -  Our  aim  has  been  to  determine  the  extent 
to  which  data  and  voice  can  share  a  link's  transmission  capacity 
A  basic  theme  has  been  the  "service  demand  vs.  service  supply" 
mismatch  problem,  and  we  have  seen  that  changes  which  allow 
faster  movement  between  "high"  and  "low"  data  service  states 
improve  performance.  The  analysis  of  data  queue  performance  for 
fixed  rates  rQ,...,rN  was  roughly  based  on  the  assumption  that, 
because  of  its  delay  requirements,  voice  must  have  "nearly" 
complete  priority,  i.e.  r  cannot  change  until  A  changes. 

(One  could  perhaps  optimize  data  queue  performance  over  choices 
of  t  , tha"  yield  equivalent  overall  speech  quality  levels 
but  there  is  not  too  much  flexibility.)  An  important  conceptual 
question  is  whether  a  "dynamic  control",  i.e.  r  depends  on 
voice  activity  and  backlog  size,  offers  substantial  improvement. 
The  following  "heuTistic"  argument  indicates  a  possible  reason 
for  expecting  improvement.  It  seems  that  a  large  part  of  the 
delay  in  queuing  systems  is  incurred  by  a  small  percentage  of 
customers,  in  particular,  those  arriving  during  or  right  after 
"surges"  which  overwhelm  the  server.  A  numerical  example 
supporting  this  "interpretation"  can  be  found  by  considering  a 
truncated  M/M/1  queue,  with  room  for  say  L  customers. 

(Overflow  is  "lost".)  For  p  ■  .9  and  L  *  20  the  loss  is  1.3%, 
and  the  average  number  in  system  is  .6  •  At  L  ■  10,  the 
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loss  is  5%,  and  average  number  in  system  is  .26  .  The 

same  reduction  in  delay  would  not  occur  if  we  rejected  1.3% 
or  5%  of  customers  at  random,  i.e.  the  finite  buffer  "targets" 
the  rejection  during  surges.  In  the  context  of  the  voice/data 
queue,  this  might  mean  that  we  can  significantly  improve  data 
performance  by  providing  brief  but  large  "bursts"  of  capacity 
at  the  right  moments,  i.e.  by  "backing  off"  on  voice  priority 
at  certain  times  we  might  be  able  to  improve  data  performance 
while  maintaining  acceptable  speech  quality.  It  is  not  clear 
how  brief  these  bursts  can  be  and  still  help  the  data. 

•  More  Queuing  Inequalities  -  We  conjecture  that  other 
queuing  inequalities  of  the  type  in  VII. C  are  true.  One  might 
consider  inequalities  involving  system  time,  number  in  system, 
backlog  etc.  and  various  notions  of  "stochastically  smaller", 
but  the  basic  conjectures  fall  into  these  categories. 

•  All  other  things  being  equal,  performance  improves 
monotonically  as  A,u  increase,  A/y  fixed. 

•  All  other  things  being  equal,  performance  improves  as 
one  "equalizes"  service  rates,  i.e.  raise  some  r^  that 
is  below  r  and  reduce  some  r^  above  r  in  such  a  way 
that  the  average  is  maintained.  Since  the  vector  r  1 
is  the  "ultimate  equalization",  a  result  of  this  type 
involving  the  backlog  and  the  stochastic  inequality 
measure  <^,  would  imply  our  previous  result  in  VI  C. 
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Matrix-Geometric  Analysis  With  Other  Voice  Models  - 

The  matrix  geometric  approach  works  with  any  underlying 

Markov  chain  for  the  phase  process.  It  would  be  interesting 

to  carry  out  the  numerical  calculations  for  the  three  state 

^  1.  , 

single  speaker  model 

AX.  A  » 


(In  this  case,  the  phase  process  for  N  speakers  would  be 
the  Markov  process  (T(t),  S1(t)),see  I.C.)  This  model  is 
more  realistic  for  small  numbers  of  speakers  since  the 
effects  of  two  different  types  of  silences  do  not  "wash 
out"  until, apparently,  N  is  25-30.  Note  that  in  this 
model,  a  substantial  portion  of  the  silence  time  comes 
from  the  "long,  infrequent"  silences  in  S2,  i.e.  the 
alternation  between  talkspurt  and  a  part  of  the  silences 
is  on  a  slower  scale.  Thus,  one  would  expect  degraded 
performance. 


•  General  Length  Distributions  (i.e.  the  M/G  case)  - 
If  message  lengths  are  not  exponentially  distributed, 
then  the  process  (A(t) ,  K(t)]  is  not  Markov.  However, 
the  process  [A(t),  X(t)],  X(t)  ■  bit  backlog, is,  and 
one  can  derive  coupled  integro-differential  equations 
for  the  joint  equilibrium  probabilities  G^(x)  ■ 

Pr[A  ■  i,  X^x].  (The  equations  are  similar  to  those  in 
the  flow  model  in  Chapter  VI,  except  that  the  discrete 
nature  of  message  arrivals  leads  to  terms  which  are 


convolutions  of  the  G^(x)  and  the  message  length  distri¬ 
bution.)  These  equations  have  been  studied  by  Half in 
and  Segal  [26],  [27].  Since  they  are  more  difficult  to 
handle  (than  the  matrix-geometric  approach) ,  one  would 
be  interested  in  knowing  the  "sensitivity"  of  the  data 
queue  performance  to  the  message  length  distribution, 
i.e.  for  what  purposes  can  one  assume  exponentially 
distributed  message  lengths. 

2 

The  equation  a  ■  a  ♦  £  a  .  The  expressions  for  a  and  ji 

are  relatively  "simple",  and,  more  important,  the  basic 

parameters  of  the  problem  appear  in  a  rather  "direct" 

way.  It  would  be  nice  to  have  some  relation  (or  bound) 

on  some  quantities  pertaining  to  a  (e.g.  eigenvalues) 

in  terms  of  the  corresponding  quantities  for  a  and  £  . 

Also,  the  convergence  rate  of  the  iteration  when 
T 

H  (a  -  £)1  <  0,  i.e.  when  we  are  computing  the  strictly 
substochatic  matrix  a*,  needs  to  be  determined. 
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APPENDIX  A. 


The  exact  expression  for  is 


I  1  N 

—  —  £ 


nv  p 


n 


J  *n 


P3  ' 


so  our  task  is  to  estimate  this,  when  the  {p^  }  are  the  terms 
of  a  binomial  distribution.  For  convenience,  we  will  only 
consider  symmetric  distributions  i.e.  \  *  u  .  This  does  not 
result  in  loss  of  generality  since  the  asymptotic  expressions 
can  be  appropriately  modified  for  asymmetric  distributions. 

The  value  of  some  constant  might  change,  but  we  are  not  concerned 
with  this.  Also,  we  will  assume  y  *  1,  since  for  fixed  X/y,  T’ 


n 


is  inversely  proportional  to  y,  i.e.  the  {p ^ }  depend  only 
on  •£.  Any  result  we  quote  is  taken  from  Feller  [28], 
Chapter  6. 

Notations: 

•  2N  is  the  numbeT  of  speakers,  so  J.  *  N,  and  the 
standard  deviation  oN  ■  /N/2 . 

•  Let  <p(y)  *  (/27)'1  exp(-y2-/Z)  ,  «(y)  -  f  4>(t)dt 


•POO  -  (?)  2  -2N 

N 

•H(n)  -  Z  pCj) 
j-n 

We  consider  the  asymptotic  behavior  of 
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T-  *  - ^IGfNiJ -  -Oi*  G(.\)  =  N  +  f(N) 

^  UG(N)  PG(..V) 

Region  I:  lim  f(N)//?T  <  « 

From  Feller: 

p(G(N) ) - i-  expC-f2  (N)/N) 

/ttTT 


H(G(N))~  1  -  «(£(N)/aN) 


•  • 


/N?  (1  -  S(f(N)/oN  )) 


GOT  N+f (N)  exp(-f2(Hi/N) 


e.g. 


f(N)//TT  -  0,  T*(n)  -  2  /£ 


if  ftN)/,N  -  v, 


exp(-y2/2) 


For  large  values  of  Y  ,  we  can  apply  the  assymptotic  result 
I  -  *(y)  ~  ^  to  obtain  T*(n)~  ,/f  -i=  -  75-^- 


Region  II:  f(N)  ■  X^  where  XN  ♦  «  but  X^/a^  •+  0, 


Let  s(XN)  ■  a 


f  1  A  1  ”  1  ,  f  1  \  £  ”  1  y 

(?)  .  ♦  C-  7)  /N 

'\j  -  -  C*=“  J 

‘  1-3  1(1  -  1)  aN 


.  a2  I  V 
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then  from  Feller,  we  have 


*(X„) 

P(G(N) )  ~  exp(  -  s(XM)) 


'N 


H(G(N) ) — (1  -  2(Xn))  exp  (-S(XN)) 
Combining  this  with  1  -  S(y)  ~  ,  we  obtain 


JN 


G(N)  N+f (N)  *N 


N+XN  °N 


In 

CN 


2xn  °n 


Region  III:  f(N)/N  +  y  ,  0  <  y  <  1 

From  the  simple  one  step  bounds  ~  <  T"  <  - — L__ 

U„  -  n  —  u_  -  Xr 


n  n 


we  obtain 


N+f (N) 


-  tg(N)  -  TfTfl)-  ’  For  in  this 


region  ,  these  become  — - -  <  TrfM-,  <  -i-t  in  the  limit. 

(l+y)N  ”  “  2yN 

We  will  show  that  the  upper  bound  is  exact,  in  the  asymptotic 
sence,  i.e.  for  any  S>  0,  }  Ng  s.t.  V  N  >  Nfl 


G(N)  -  2y n 


IT 
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Hereafter,  all  >_  statements  used  in  the  asymptotic  sence  will 
be  in  the  above  context,  and  we  write  ~  without  going  through 


the  "for  any  6,  j  N"  argument  in  a  formal  way. 


Let  j  be  a  given  positive  integer.  Then 


Nou  W 


2N  m 

(  J  n 

J  i-1 


Mow  fix  an  integer  k.  Because  each  term  in  the  sum  is  positive 

and  because,  for  fixed  x,  *-s  decreasing  in  y,  we  obtain 

x+y 

the  bound 


Now  observe  that  if  f(N)/N  0  as  N  -*•  »,  then  (fri+'f 

as  N  -*■  00 ,  and  the  bound  is  useless.  However,  if  f(N)/N  -*■  y  , 
then  this  expression  approaches 


Since  k  is  arbitrary,  we  can  conclude 


1 
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